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Title 

Recombinant Mcgalomicin Biosynthetic Genes And Uses Thereof 

Cross-Reference to Priority Application 
5 This application claims priority to provisional U.S. patent application 

Serial No. 60/158,305, filed 8 October 1999, and provisional U.S. patent 
application Serial No. 60/190,024, filed 17 March 2000 under 35 U.S.C. § 1 19(e). 
The content of the above referenced applications is incorporated herein by 
reference in its entirety. 



Field of the Invention 
" The present invention provides recombinant methods and materials for 
producing polyketides by recombinant DNA technology. The invention relates to 
1 5 the fields of agriculture, animal husbandry, chemistry, medicinal chemistry, 
medicine, molecular biology, pharmacology, and veterinary technology. 



Background of the Invention 
Polyketides represent a large family of diverse compounds synthesized 

20 from 2-carbon units through a series of condensations and subsequent 

modifications. Polyketides occur in many types of organisms, including fungi and 
mycelial bacteria, in particular, the actinomycetes. There are a wide variety of 
polyketide structures, and the class of polyketides encompasses numerous 
compounds with diverse activities. Erythromycin, FK-506, FK-520, megalomicin, 

25 narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and tylosin are 
examples of such compounds. Given the difficulty in producing polyketide 
compounds by traditional chemical methodology, and the typically low production 
of polyketides in wild-type cells, there has been considerable interest in finding 
improved or alternate means to produce polyketide compounds. See PCT 

30 publication Nos. WO 93/1 3663; WO 95/08548; WO 96/40968; WO 97/02358; 
and WO 98/27203; United States Patent Nos. 4,874,748; 5,063,155; 5,098,837; 
5,149,639; 5,672,491; and 5,712,146; Fu et al., 1994, Biochemistry 33: 9321- 
9326; McDaniel et aL 9 1993, Science 262: 1546-1550; and Rohr, 1995, Angew. 
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Chem. Int. Ed. Engl 34(S): 881-888, each of which is incorporated herein by 
reference. 

Polyketides are synthesized in nature by polyketide synthase (PKS) 
enzymes. These enzymes, which are complexes of multiple large proteins, are 
5 similar to the synthases that catalyze condensation of 2-carbon units in the 

biosynthesis of fatty acids. PKS enzymes are encoded by PKS genes that usually 
consist of three or more open reading frames (ORFs). Two major types of PKS 
enzymes are known; these differ in their composition and mode of synthesis. 
These two major types of PKS enzymes are commonly referred to as Type I or 

1 0 "modular" and Type II "iterative" PKS enzymes. 

Modular PKSs are responsible for producing a large number of 12-, 14-, 
and 16-membered macrolide antibiotics including erythromycin, megalomicin, 
methymycin, narbomycin, oleandomycin, picromycin, and tylosin. Each ORF of a 
modular PKS can comprise one, two, or more "modules" of ketosynthase activity, 

1 5 each module of which consists of at least two (if a loading module) and more 

typically three (for the simplest extender module) or more enzymatic activities or 
"domains." These large multifunctional enzymes (>300,000 kDa) catalyze the 
biosynthesis of polyketide macrolactones through multistep pathways involving 
decarboxylative condensations between acyl thioesters followed by cycles of 

20 varying B-carbon processing activities (see O'Hagan, D. The polyketide 

metabolites; E. Horwood: New York, 1991, incorporated herein by reference). 

During the past half decade, the study of modular PKS function and 
specificity has been greatly facilitated by the plasmid-based Streptomyces 
coelicolor expression system developed with the 6-deoxyerythronolide B (6-dEB) 

25 synthase (DEBS) genes (see Kao et al, 1994, Science, 265: 509-5 12, McDaniel et 
a/., 1993, Science 262: 1 546-1 S57, and U.S. Patent Nos. 5,672,491 and 
5,712,146, each of which is incorporated herein by reference). The advantages to 
this plasmid-based genetic system for DEBS are that it overcomes the tedious and 
limited techniques for manipulating the natural DEBS host organism, 

30 Saccharopolyspora erythraea, allows more facile construction of recombinant 
PKSs, and reduces the complexity of PKS analysis by providing a "clean" host 
background. This system also expedited construction of the first combinatorial 
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modular polyketide library in Streptomyces (see PCT publication No. WO 
98/49315, incorporated herein by reference). 

The ability to control aspects of polyketide biosynthesis, such as monomer 
selection and degree of B-carbon processing, by genetic manipulation of PKSs has 
5 stimulated great interest in the combinatorial engineering of novel antibiotics (see 
Hutchinson, 1998, Curr. Opin. Microbiol. I: 319-329; Carrerasand Santi, 1998, 
Curr. Opin. Biotech. 9: 403-41 1; and U.S. Patent Nos. 5,712,146 and 5,672,491, 
each of which is incorporated herein by reference). This interest has resulted in the 
cloning, analysis, and manipulation by recombinant DNA technology of genes that 

10 encode PKS enzymes. The resulting technology allows one to manipulate a known 
PKS gene cluster either to produce the polyketide synthesized by that PKS at 
higher levels than occur in nature or in hosts that otherwise do not produce the 
polyketide. The technology also allows one to produce molecules that are 
structurally related to, but distinct from, the polyketides produced from known 

1 5 PKS gene clusters. 

Megalomicin is a macrolide antibiotic produced by Micromonospora 
megalomicea, a member of the Actinomycetales family of soil bacteria that 
produces many types of biologically active compounds. Megalomicin is a 
glycoside of erythromycin A, a widely used antibacterial drug with little or no 

20 antimalarial activity. Megalomicin has antibacterial properties similar to those of 
erythromycin, and in 1998, it was discovered also to have potent antiparasitic 
activity and low toxicity. The antiparasitic activity may be related to the effect 
megalomicin has on protein trafficking in eukaryotes, where it appears to inhibit 
vesicular transport between the medial and trans-Golgi, resulting in under- 

25 sialylation of proteins. Hence, megalomicin offers an exciting opportunity to 
develop a new class of antiparasitic drugs with a different mechanism of action 
than the drugs currently in use and, therefore, possibly active against drug-resistant 
forms of Plasmodium falciparum. 

The number and diversity of megalomicin derivatives have been limited 

30 due to the inability to manipulate the PKS genes, which have not previously been 
available in recombinant form. Genetic systems that allow rapid engineering of the 
megalomicin biosynthetic genes would be valuable for creating novel compounds 
for pharmaceutical, agricultural, and veterinary applications. The production of 
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such compounds could be more readily accomplished if the heterologous 
expression of the megalomicin biosynthetic genes in Streptomyces coelicolor and 
S. lividans and other host cells were possible. The present invention meets these 
and other needs. 

5 

Summary of the Invention 
The present invention provides recombinant methods and materials for 
expressing PKS enzymes and polyketide modification enzymes derived in whole 
and in part from the megalomicin biosynthetic genes in recombinant host cells. 

1 0 The invention also provides the polyketides produced by such PKS enzymes. The 
invention provides in recombinant form all of the genes for the proteins that 
constitute the complete PKS that ultimately results, in Micromonospora 
megalomicea y in the production of megalomicin. Thus, in one embodiment, the 
invention is directed to recombinant materials comprising nucleic acids with 

1 5 nucleotide sequences encoding at least one domain, module, or protein encoded by 
a megalomicin PKS gene. In one preferred embodiment of the invention, the DNA 
compounds of the invention comprise a coding sequence for at least one and 
preferably two or more of the domains of the loading module and extender 
modules 1 through 6, inclusive, of the megalomicin PKS. 

20 In one embodiment, the invention provides a recombinant expression 

vector that comprises a heterologous promoter positioned to drive expression of 
one or more of the megalomicin biosynthetic genes. In a preferred embodiment, 
the promoter is derived from another PKS gene. In a related embodiment, the 
invention provides recombinant host cells comprising one or more expression 

25 vectors that produce(s) megalomicin or a megalomicin derivative or precursor. In 
a preferred embodiment, the host cell is Streptomyces lividans or S. coelicolor. 

In another embodiment, the invention provides a recombinant expression 
vector that comprises a promoter positioned to drive expression of a hybrid PKS 
comprising all or part of the megalomicin PKS and at least a part of a second PKS. 

30 In a related embodiment, the invention provides recombinant host cells 
comprising the vector that produces the hybrid PKS and its corresponding 
polyketide. In a preferred embodiment, the host cell is Streptomyces lividans or S. 
coelicolor. 
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In a related embodiment, the invention provides recombinant materials for 
the production of libraries of polyketides wherein the polyketide members of the 
library are synthesized by hybrid PKS enzymes of the invention. The resulting 
polyketides can be further modified to convert them to other useful compounds, 
5 such as antibiotics, motilides, and antiparasitics, typically through hydroxylation 
and/or glycosylation. Modified macrolides provided by the invention that are 
useful intermediates in the preparation of antiparasitics are of particular benefit. 

In another related embodiment, the invention provides a method to prepare 
a nucleic acid that encodes a modified PKS, which method comprises using the 

10 megalomicin PKS encoding sequence as a scaffold and modifying the portions of 
the nucleotide sequence that encode enzymatic activities, either by mutagenesis, 
inactivation, deletion, insertion, or replacement. The thus modified megalomicin 
PKS encoding nucleotide sequence can then be expressed in a suitable host cell 
and the cell employed to produce a polyketide different from that produced by the 

15 megalomicin PKS. In addition, portions of the megalomicin PKS coding sequence 
can be inserted into other PKS coding sequences to modify the products thereof. 

In another related embodiment, the invention is directed to a multiplicity of 
cell colonies, constituting a library of colonies, wherein each colony of the library 
contains an expression vector for the production of a modular PKS derived in 

20 whole or in part from the megalomicin PKS. Thus, at least a portion of the 

modular PKS is identical to that found in the PKS that produces megalomicin and 
is identifiable as such. The derived portion can be prepared synthetically or 
directly from DNA derived from organisms that produce megalomicin. In 
addition, the invention provides methods to screen the resulting polyketide and 

25 antibiotic libraries. 

The invention also provides novel polyketides, motilides, antibiotics, 
antiparasitics and other useful compounds derived therefrom. The compounds of 
the invention can also be used in the manufacture of another compound. In a 
preferred embodiment, the compounds of the invention are formulated in a 

30 mixture or solution for administration to an animal or human. 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of megalomicin 
polyketide synthase (PKS) or a megalomicin modification enzyme. The isolated 
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nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated nucleic 
acid fragment is a recombinant DNA compound. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or a megalomicin 
5 modification enzyme. Exemplary ORFs of megalomicin PKS include the ORF s of 
the megAI, megAII and megAIII genes. The isolated nucleic acid fragment can 
also encode a single, multiple, or all of the domains of the megalomicin PKS. 
Exemplary domains of the megalomicin PKS include a TE domain, a KS domain, 
an AT domain, an ACP domain, a KR domain, a DH domain and an ER domain. 

10 In a preferred embodiment, the nucleic acid fragment encodes a module of the 
megalomicin PKS. In another preferred embodiment, the nucleic acid fragment 
encodes the loading module, a thioesterase domain, and all six extender modules 
of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 

1 5 conversion of 6-dEB into a megalomicin such as the enzymes encoded by the 
megF, meg BV y megCHI, megK, megDI and megG (renamed megY) genes. 
Megalomicin modification enzymes also include those enzymes involved in the 
biosynthesis of mycarose, megosamine or desosamine, which are used as 
biosynthetic intermediates in the biosynthesis of various megalomicin species and 

20 other related polyketides. The enzymes that are involved in biosynthesis of 
mycarose, megosamine or desosamine are described in Figures 5 and 10. 

In a preferred embodiment, the invention provides an isolated nucleic acid 
fragment which hybridizes to a nucleic acid having a nucleotide sequence set forth 
in the SEQ. ID NO: 1, under low, medium or high stringency. More preferably, the 

25 nucleic acid fragment comprises, consists or consists essentially of a nucleic acid 
having a nucleotide sequence set forth in the SEQ. ID NO: 1 . 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

30 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
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also provided. Preferably, such fragments, analogs or derivatives can be 
recognized by an antibody raised against a megalomicin PKS or megalomicin 
modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
5 least 90% identity, to their wild type counterparts. 

In still another specific embodiment, the invention provides an antibody, or 
a fragment or derivative thereof, which immuno-specifically binds to a domain of 
megalomicin polyketide synthase (PKS) or a megalomicin modification enzyme. 
The antibody can be a monoclonal or polyclonal antibody or an antibody fragment. 

1 0 Preferably, the antibody is a monoclonal antibody. 

In yet another specific embodiment, the invention provides a recombinant 
DNA expression vector comprising the recombinant DNA compound encoding at 
least a domain of the megalomicin PKS or a megalomicin modification enzyme, 
wherein said domain is operably linked to a promoter. Preferably, the 

1 5 recombinant DNA expression vector further comprises an origin of replication or a 
segment of DNA that enables chromosomal integration. 

In yet another specific embodiment, the invention provides a recombinant 
host cell comprising the above-described recombinant DNA expression vector 
encoding at least a domain of megalomicin PKS or the megalomicin modification 

20 enzyme. The recombinant host cells can be any suitable host cells including 
animal, mammalian, plant, fungal, yeast, and bacterial cells. Preferably, the 
recombinant host cells are Streptomyces cells, such as Streptomyces lividans and 
S. coelicolor cells, or ccharopolyspora cells, such as Saccharopolyspora erythraea 
cells. Also preferably, the recombinant host cells do not produce megalomicin in 

25 their untransformed, non-recombinant state. 

When the recombinant host cell contains nucleic acid encoding more than 
one megalomicin PKS or megalomicin modification enzyme, or domains thereof, 
such nucleic acid material can be located at a single genetic locus, e.g., on a single 
plasmid or at a single chromosomal locus, or at different genetic loci, e.g., on 

30 separate plasmids and/or chromosomal loci. In one example, the invention 
provides a recombinant host cell, which comprises at least two separate 
autonomously replicating recombinant DNA expression vectors, and each of said 
vectors comprises a recombinant DNA compound encoding a megalomicin PKS 

7 
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domain or a megalomicin modification enzyme operably linked to a promoter. In 
another example, the invention provides a recombinant host cell, which comprises 
at least one autonomously replicating recombinant DNA expression vector and at 
least one modified chromosome, each of said vector(s) and each of said modified 
5 chromosome comprises a recombinant DNA compound encoding a megalomicin 
PKS domain or a megalomicin modification enzyme operably linked to a 
promoter. Preferably, the autonomously replicating recombinant DNA expression 
vector and/or the modified chromosome further comprises distinct selectable 
markers. 

10 In a preferred embodiment, the cell comprises three different vectors, one 

of which is integrated into the chromosome and two of which are autonomously 
replicating, and each of the vectors comprises a meg PKS gene. Optionally, one or 
more of the meg PKS genes contains one or more domain alterations, such as a 
deletion or substitution of a meg PKS domain with a domain from another PKS. 
15 In yet another specific embodiment, the invention provides a hybrid PKS, 

which is produced from a recombinant gene that comprises at least a portion of a 
megalomicin PKS gene and at least a portion of a second PKS gene for a 
polyketide other than megalomicin. For example, and without limitation, the 
second PKS gene can be a narbonolide PKS gene, an oleandolide PKS gene, or a 
rapamycin PKS gene. In one embodiment, the hybrid PKS is composed of a 
loading module and six extender modules, wherein at least one domain of any one 
of extender modules 1 through 6, inclusive, is a domain of an extender module of 
megalomicin PKS. In another preferred embodiment, the hybrid PKS comprises a 
megalomicin PKS that has a non-functional KS domain in module 1 . 

In yet another specific embodiment, the invention provides a method of 
producing a polyketide, which method comprises growing the recombinant host 
cell comprising a recombinant DNA expression vector encoding at least a domain 
of the megalomicin PKS or a megalomicin modification enzyme under conditions 
whereby the megalomicin PKS domain or the megalomicin modification enzyme 
comprised by the recombinant expression vector is produced and the polyketide is 
synthesized by the cell, and recovering the synthesized polyketide. Preferably, the 
recombinant host cell comprises a recombinant expression vector that encodes at 
least a portion of a megAI, megAII, or megAHl gene. 

8 
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These and other embodiments of the invention are described in more detail 
in the following description, the examples, and claims set forth below. 

Brief Description of the Figures 
5 Figure 1 shows restriction site and function maps of the insert DMA in 

cosmids pKOS079-138B, pKOS079-93D, pKOS079-93A, and pKOS079-124B of 
the invention. Various restriction sites (Xhol, 5g/II, NsiT) are also shown. The 
location of the megalomicin biosynthetic genes is shown below the solid lines 
indicating the cosmid inserts. The genes are shown as arrows pointing in the 

10 direction of transcription. The approximate size (in kilobase (kb) pairs) of the gene 
cluster is indicated in 5000 bp (i.e., 5K, 10K, and the like.) increments on a solid 
bar beneath the arrows indicating the genes. 

Figure 2 shows a more detailed map of the megalomicin biosynthetic gene 
cluster. The various open reading frames are shown as arrows pointing in the 

1 5 direction of transcription. A line indicates the size in base pairs (in 1 000 bp 

increments) of the gene cluster. The various domains of the megalomicin PKS are 
also shown. Other genes of the megalomicin biosynthetic gene cluster not shown 
in this Figure are located in the insert DNA of cosmids pKOSOl 38B and 
pKOS0124B. 

20 Figure 3 shows the structures of the megalomicins, azithromycin and 

erythromycin A. 

Figure 4 shows the modules and domains of DEBS and the megalomicin 

PKS. 

Figure 5 shows the compounds and reactions in the erythromycin 
25 biosynthetic pathway and also for megalomicin biosynthesis. Genes that produce 
the various enzymes that catalyze each of the steps in the biosynthetic pathway are 
indicated. 

Figure 6 shows the biosynthetic pathway for the formation of desosamine, 
rhodosamine, and mycarose, as well as the genes that produce the various enzymes 
30 that catalyze each of the steps in the biosynthetic pathway. 

Figure 7 depicts nucleotide and amino acid sequence of Micromonospora 
megalomicea megalomicin biosynthetic genes (GenBank Accession No. 
AF263245, incorporated herein by reference). 

9 
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Figure 8 depicts the biosynthesis of the erythromycins and megalomicins 
and the enzymes that mediate the biosynthesis of each. 

Figure 9 depicts the cloned megalomicin biosynthetic gene cluster and 
certain cosmids of the invention that comprise portions of the cluster. 
5 Figure 10 depicts the biosynthesis of megosamine, mycarose, and 

desosamine. 

Detailed Description of the Invention 
The present invention provides useful compounds and methods for 

10 producing polyketides in recombinant host cells. As used herein, the term 
recombinant refers to a compound or composition produced by human 
intervention. The invention provides recombinant DNA compounds encoding all 
or a portion of the megalomicin biosynthetic genes. The invention provides 
recombinant expression vectors useful in producing the megalomicin PKS and 

1 5 hybrid PKSs composed of a portion of the megalomicin PKS in recombinant host 
cells. The invention also provides the polyketides produced by the recombinant 
PKS and polyketide modification enzymes. 

To appreciate the many and diverse benefits and applications of the 
invention, the description of the invention below is organized as follows. In 

20 Section 1, common definitions used throughout this application are provided. In 
Section II, structural and functional characteristics of megalomicin are described. 
In Section III, the recombinant megalomicin biosynthetic genes and other 
recombinant nucleic acids provided by the invention are described. In Section IV, 
polypeptides and proteins encoded by the megalomicin biosynthetic genes and 

25 antibodies that specifically bind to such polypeptides and proteins provided by the 
invention are described. In Section V, methods for heterologous expression of the 
megalomicin biosynthetic genes provided by the invention are described. In 
Section VI, the hybrid PKS genes provided by the invention are described. In 
Section VII, host cells containing multiple megalomicin biosynthetic genes and 

30 nucleic acid fragments on separate express vectors provided by the invention are 
described. In Section VIII, the polyketide compounds provided by the invention 
and pharmaceutical compositions of those compounds are described. The detailed 
description is followed by working examples illustrating the invention. 
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Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as is commonly understood by one of ordinary skill in the 
art to which this invention belongs. All patents, applications, published 
applications and other publications and sequences from GenBank and other data 
5 bases referred to herein are incorporated by reference in their entirety. 

Section L Definitions 

As used herein, domain refers to a portion of a molecule, e.g., proteins or 
nucleic acids, that is structurally and/or functionally distinct from another portion 
10 of the molecule. 

As used herein, antibody includes antibody fragments, such as Fab 
fragments, which are composed of a light chain and the variable region of a heavy 
chain. 

As used herein, biological activity refers to the in vivo activities of a 
1 5 compound or physiological responses that result upon in vivo administration of a 

compound, composition or other mixture. Biological activity, thus, encompasses 

therapeutic effects and pharmaceutical activity of such compounds, compositions 

and mixtures. Biological activities may be observed in in vitro systems designed 

to test or use such activities. 
20 As used herein, a combination refers to any association between two or 

among more items. 

As used herein, a composition refers to any mixture. It may be a solution, 

a suspension, liquid, powder, a paste, aqueous, non-aqueous or any combination 

thereof. 

25 As used herein, derivative or analog of a molecule refers to a portion 

derived from or a modified version of the molecule. 

As used herein, operably linked, operatively linked or operationally 
associated refers to the functional relationship of DNA with regulatory and 
effector sequences of nucleotides, such as promoters, enhancers, transcriptional 

30 and translational stop sites, and other signal sequences. For example, operative 
linkage of DNA to a promoter refers to the physical and functional relationship 
between the DNA and the promoter such that the transcription of such DNA is 
initiated from the promoter by an RNA polymerase that specifically recognizes, 
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binds to and transcribes the DNA. To optimize expression and/or in vitro 
transcription, it may be helpful to remove, add or alter 5' untranslated portions of 
the clones to eliminate extra, potentially inappropriate alternative translation 
initiation (i.e., start) codons or other sequences that may interfere with or reduce 
5 expression, either at the level of transcription or translation. Alternatively, 
consensus ribosome binding sites (see, e.g., Kozak, J. Biol. Chem., 26(5:19867- 
19870 (1991)) can be inserted immediately 5' of the start codon and may enhance 
expression. The desirability of (or need for) such modification may be empirically 
determined. 

1 0 As used herein, pharmaceutical^ acceptable salts, esters or other 

derivatives of the conjugates include any salts, esters or derivatives that may be 
readily prepared by those of skill in this art using known methods for such 
derivatization and that produce compounds that may be administered to animals or 
humans without substantial toxic effects and that either are pharmaceutically 

1 5 active or are prodrugs. 

As used herein, a promoter region or promoter element refers to a segment 
of DNA or RNA that controls transcription of the DNA or RNA to which it is 
operatively linked. The promoter region includes specific sequences that are 
sufficient for RNA polymerase recognition, binding and transcription initiation. 

20 This portion of the promoter region is referred to as the promoter. In addition, the 
promoter region includes sequences that modulate this recognition, binding and 
transcription initiation activity of RNA polymerase. These sequences may be cis 
acting or may be responsive to trans acting factors. Promoters, depending upon 
the nature of the regulation, may be constitutive or regulated. 

25 As used herein: stringency of hybridization in determining percentage 

mismatch is as follows: (I) high stringency: 0.1 x SSPE, 0.1% SDS, 65°C; (2) 
medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C; and (3) low stringency: 1.0 x 
SSPE, 0.1% SDS, 50°C. Equivalent stringencies may be achieved using alternative 
buffers, salts and temperatures. 
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The term substantially identical or homologous or similar varies with the 
context as understood by those skilled in the relevant art and generally means at 
least 70%, preferably means at least 80%, more preferably at least 90%, and most 
preferably at least 95% identity. 
5 As used herein, substantially identical to a product means sufficiently 

similar so that the property of interest is sufficiently unchanged so that the 
substantially identical product can be used in place of the product. 

As used herein, isolated means that a substance is either present in a 
preparation at a concentration higher than that substance is found in nature or in its 

10 naturally occurring state or that the substance is present in a preparation that 

contains other materials with which the substance is not associated with in nature. 
As an example of the latter, an isolated meg PKS protein includes a meg PKS 
protein expressed in a Streptomyces coelicolor or 5*. lividans host cell. 

As used herein, substantially pure means sufficiently homogeneous to 

1 5 appear free of readily detectable impurities as determined by standard methods of 
analysis, such as thin layer chromatography (TLC), gel electrophoresis and high 
performance liquid chromatography (HPLC), used by those of skill in the art to 
assess such purity, or sufficiently pure such that further purification would not 
detectably alter the physical and chemical properties, such as enzymatic and 

20 biological activities, of the substance. Methods for purification of the compounds 
to produce substantially chemically pure compounds are known to those of skill in 
the art. A substantially chemically pure compound may, however, be a mixture of 
stereoisomers or isomers. In such instances, further purification might increase 
the specific activity of the compound. 
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As used herein, vector or plasmid refers to discrete elements that are used 
to introduce heterologous DNA into cells for either expression or replication 
thereof. Selection and use of such vehicles are well known within the skill of the 
artisan. An expression vector includes vectors capable of expressing DNAs that 
5 are operatively linked with regulatory sequences, such as promoter regions, that 
are capable of effecting expression of such DNA fragments. Thus, an expression 
vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, 
recombinant virus or other vector that, upon introduction into an appropriate host 
cell, results in expression of the cloned DNA. Appropriate expression vectors are 
1 0 well known to those of skill in the art and include those that are replicable in 

eukaryotic cells and/or prokaryotic cells and those that remain episomal or those 
which integrate into the host cell genome. 

Section II. Megalomicins 

1 5 The megalomicins were discovered in 1 969 at Schering Corp. as 

antibacterial agents produced by Micro monospora megalomicea (see Weinstein et 
aL, 1969, J. Antibiotics 22: 253-258, and U.S. Patent No. 3,632,750, both of 
which are incorporated herein by reference). Although the initial structural 
assignment was in error, a thorough reassessment of NMR data coupled with an 

20 X-ray crystal structure of a megalomicin A derivative (see Nakagawa and Omura, 
"Structure and Stereochemistry of Macrolides" in Macrolide Antibiotics (S. 
Omura, ed.), Academic Press, NY, 1984, incorporated herein by reference) 
established the structures shown in Figure 3. The megalomicins are 6-0- 
glycosides of erythromycin C with acetyl or propionyl groups esterified at the 3'" 

25 or 4" 5 hydroxyls of the mycarose sugar at the C-3-position. The C-6 sugar has 

been named "megosamine," although it had been identified 5 to 10 years earlier as 
L-rhodosamine or yV-dimethyldaunosamine, deoxyamino sugars commonly present 
in the anthracycline antitumor drugs. The antibacterial potency, spectrum of 
activity, and toxicity (LD 5 o acute, 7-7.5 g/kg s.c. or oral; subacute, >500 mg/kg) of 

30 the megalomicins is similar to that of erythromycin A. 

The megalomicins have two modes of biological activity. As antibacterials, 
they act like the erythromycins, which inhibit protein synthesis at the translocation 
step by selective binding to the bacterial 505 ribosomal RNA. They also affect 
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protein trafficking in eukaryotic cells (see Bonay et al, 1996, J. Biol. Chem. 
277:3719-3 726, incorporated herein by reference). Although the mechanism of 
action is not entirely clear, it appears to involve inhibition of vesicular transport 
between the medial and trans Golgi, resulting in under-sialylation of proteins. The 
5 megalomicins also strongly inhibit the ATP-dependent acidification of lysosomes 
in vivo (see Bonay et al., 1997, J. Cell Sci. 770:1839-1849, incorporated herein by 
reference) and cause an anomalous glycosylation of viral proteins, which may be 
responsible for their antiviral activity against herpes (Tox 50 , 70-100 \xM; see 
Alarcon etai, m4,Antivir. Res. 4:231-243, and Alarcon et al, 1988, FEBS Lett. 
257 :207-21 1, both of which are incorporated herein by reference). 

Strikingly, the megalomicins are potent antiparasitic agents, showing an 
1C 50 of 1 |ig/ml in blocking intracellular replication of Plasmodium falciparum 
infected erythrocytes (see Bonay et al, 1998, Antimicrob. Agents Chemother. 
42:2668-2673, incorporated herein by reference). The megalomicins are effective 
against Trypanosoma cruzi and T. brucei (IC 5 o, 0.2-2 ng/ml) plus Leishmania 
donovani and L major promastigotes (IC 50 , 3 and 8 ^g/ml, respectively). 
Megalomicin is also active against the intracellular replicative, amastigote form of 
T. cruzi, completely preventing its replication in infected murine LLC/MK2 
macrophages at a dose of 5 |ig/ml. Importantly, the effective drug concentration is 
500-fold less than the acute LD50 in mammals, and there is no toxicity to BALB/c 
mice at doses (50 mg/kg) that are completely curative for T. brucei infections. 
Because the erythromycins do not have such activity, although azithromycin 
(Figure 3) has been reported to be an effective acute and prophylactic treatment for 
malaria caused by P. vivax and P. falciparum (see Taylor et al, 1999, Clin. Infect. 
Dis. 28:74-81, incorporated herein by reference), the antiparasitic action of the 
megalomicins is unique and probably related to the presence of the deoxyamino 
sugar megosamine at C-6 (Figure 3). Consequently, the megalomicins could be 
developed into potent antimalarial drugs with a high therapeutic index and be 
active against P. falciparum and other species that are resistant to currently used 
classes of antimalarials. They also could lead to potent antiparasitic agents against 
leishmaniasis, trypanosomiasis, and Chagas' disease. In view of the widespread 
use of the erythromycins and their good oral availability plus the low mammalian 
toxicity of macrolides in general, the megalomicins could be used prophylactically 
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to combat malaria, and as fermentation products, the megalomicins should be 
relatively inexpensive to produce. . 

The megalomicins belong to the polyketide class of natural products whose 
members have diverse structural and pharmacological properties (see Monaghan 
5 and Tkacz, 1990, Annu. Rev. Microbiol 44: 271, incorporated herein by 
reference). The megalomicins are assembled by polyketide synthases through 
successive condensations of activated coenzyme-A thioester monomers derived 
from small organic acids such as acetate, propionate, and butyrate. Active sites 
required for condensation include an acyltransferase (AT), acyl carrier protein 

10 (ACP), and beta-ketoacylsynthase (KS). Each condensation cycle results in a B- 
keto group that undergoes all, some, or none of a series of processing activities. 
Active sites that perform these reactions include a ketoreductase (KR), 
dehydratase (DH), and enoylreductase (ER). Thus, the absence of any beta-keto 
processing domain results in the presence of a ketone, a KR alone gives rise to a 

1 5 hydroxyl, a KR and DH result in an alkene, while a KR, DH, and ER combination 
leads to complete reduction to an alkane. After assembly of the polyketide chain, 
the molecule typically undergoes cyclization(s) and post-PKS modification (e.g. 
glycosylation, oxidation, acylation) to achieve the final active compound. 

Macrolides such as erythromycin and megalomicin are synthesized by 

20 modular PKSs (see Cane et ai, 1998, Science 282: 63, incorporated herein by 
reference). For illustrative purposes, the PKS that produces the erythromycin 
polyketide (6-deoxyerythronolide B synthase or DEBS; see U.S. Patent No. 
5,824,51 3, incorporated herein by reference) is shown in Figure 4. DEBS is the 
most characterized and extensively used modular PKS system. DEBS is 

25 particularly relevant to the present invention in that it synthesizes the same 
polyketide, 6-deoxyerythronolide B (6-dEB), synthesized by the megalomicin 
PKS. In modular PKS enzymes such as DEBS and the megalomicin PKS, the 
enzymatic steps for each round of condensation and reduction are encoded within 
a single "module" of the polypeptide (i.e., one distinct module for every 

30 condensation cycle). DEBS consists of a loading module and 6 extender modules 
and a chain terminating thioesterase (TE) domain within three extremely large 
polypeptides encoded by three open reading frames (ORFs, designated eryAI, 
eryAJI, and eryAHI). 
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Each of the three polypeptide subunits of DEBS (DEBS1, DEBSII, and 
DEBSIH) contains 2 extender modules, DEBSI additionally contains the loading 
module. Collectively, these proteins catalyze the condensation and appropriate 
reduction of 1 propionyl CoA starter unit and 6 methylmalonyl CoA extender 
5 units. Modules 1, 2, 5, and 6 contain KR domains; module 4 contains a complete 
set, KR/DH/ER, of reductive and dehydratase domains; and module 3 contains no 
functional reductive domain. Following the condensation and appropriate 
dehydration and reduction reactions, the enzyme bound intermediate is lactonized 
by the TE at the end of extender module 6 to form 6-dEB. 

10 More particularly, the loading module of DEBS consists of two domains, 

an acyl-transferase (AT) domain and an acyl carrier protein (AC?) domain. In 
other PKS enzymes, the loading module is not composed of an AT and an ACP 
but instead utilizes an inactivated KS, an AT, and an ACP. This inactivated KS is 
in most instances called KS Q , where the superscript letter is the abbreviation for 

1 5 the amino acid, glutamine, that is present instead of the active site cysteine 
required for activity. The AT domain of the loading module recognizes a 
particular acyl-CoA (propionyl for DEBS, which can also accept acetyl) and 
transfers it as a thiol ester to the ACP of the loading module. Concurrently, the AT 
on each of the extender modules recognizes a particular extender-CoA 

20 (methylmalonyl for DEBS) and transfers it to the ACP of that module to form a 

thioester. Once the PKS is primed with acyl- and malonyl-ACPs, the acyl group of 
the loading module migrates to form a thiol ester (trans-esterification) at the KS of 
the first extender module; at this stage, extender module 1 possesses an acyl-KS 
and a methylmalonyl ACP. The acyl group derived from the loading module is 

25 then covalently attached to the alpha-carbon of the malonyl group to form a 

carbon-carbon bond, driven by concomitant decarboxylation, and generating a new 
acyl-ACP that has a backbone two carbons longer than the loading unit 
(elongation or extension). The growing polyketide chain is transferred from the 
ACP to the KS of the next module, and the process continues. 

30 The polyketide chain, growing by two carbons each module, is sequentially 

passed as a covalently bound thiol ester from module to module, in an assembly 
line-like process. The carbon chain produced by this process alone would possess 
a ketone at every other carbon atom, producing a polyketone, from which the 



WO 01/27284 



PCT/US00/27433 



name polyketide arises. Commonly, however, the beta keto group of each two- 
carbon unit is modified just after it has been added to the growing polyketide 
chain but before it is transferred to the next module by either a KR, a KR plus a 
DH, or a KR, a DH, and an ER. As noted above, modules may contain additional 

5 enzymatic activities as well. 

Once a polyketide chain traverses the final extender module of a PKS, it 
encounters the releasing domain or thioesterase found at the carboxyl end of most 
PKSs. Here, the polyketide is cleaved from the enzyme and cyclyzed. The 
resulting polyketide can be modified further by tailoring or modification enzymes; 

0 these enzymes add carbohydrate groups or methyl groups, or make other 

modifications, i.e., oxidation or reduction, on the polyketide core molecule. For 
example, the final steps in conversion of 6-dEB to erythromycin A include the 
actions of a number of modification enzymes, such as: C-6 hydroxylation, 
attachment of mycarose and desosamine sugars, C-12 hydroxylation (which 

5 produces erythromycin C), and conversion of mycarose to cladinose via (9- 
methylation, as shown in Figure 5. 

With this overview of PKS and post-PKS modification enzymes, one can 
better appreciate the recombinant megalomicin biosynthetic genes provided by the 
invention and their function, as described in the following Section. 



Section III: The Megalomicin Biosynthetic Genes and Nucleic Acid Fragments 

The megalomicin PKS was isolated and cloned by the following 
procedure. Genomic DNA was isolated from a megalomicin producing strain of 
Micromonospora megalomicea subsp. nigra (ATCC 27598), partially digested 
with a restriction enzyme, and cloned into a commercially available cosmid vector 
to produce a genomic library. This library was then probed with probe generated 
from the erythromycin biosynthetic genes as well as from cosmids identified as 
containing sequences homologous to erythromycin biosynthetic genes. This 
probing identified a set of cosmids, which were analyzed by DNA sequence 
analysis and restriction enzyme digestion, which revealed that the desired DNA 
had been isolated and that the entire PKS gene cluster was contained in 
overlapping segments on four of the cosmids identified. Figure 1 shows the 
cosmids, and the portions of the megalomicin biosynthetic gene cluster in the 
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insert DNA of the cosmids. Figure 1 shows that the complete megalomicin 
biosynthetic gene cluster is contained within the insert DNA of cosmids 
pKOS079-138B, pKOS079-124B, pKOS079-93D, and pKOS079-93A. Each of 
these cosmids has been deposited with the American Type Culture Collection in 
5 accordance with the terms of the Budapest Treaty (cosmid pKOS079-l 38B is 

available under accession no. ATCC ; cosmid pKOS079-124B is available 

under accession no. ATCC ; cosmid pKOS079-93D is available under 

accession no. ATCC ; and cosmid pKOS079-93 A is available under 

accession no. ATCC ). Various additional reagents of the invention can be 

1 0 isolated from these cosmids. DNA sequence analysis was also performed on the 
various subclones of the invention, as described herein. Further analysis of these 
cosmids and subclones prepared from the cosmids facilitated the identification of 
the location of various megalomicin biosynthetic genes, including the ORFs 
encoding the PKS, modules encoded by those ORFs, and coding sequences for 

15 megalomicin modification enzymes. The location of these genes and modules is 
shown on Figure 2. 

Those of skill in the art will recognize that, due to the degenerate nature of 
the genetic code, a variety of DNA compounds differing in their nucleotide 
sequences can be used to encode a given amino acid sequence of the invention. 

20 The native DNA sequence encoding the megalomicin PKS and other biosynthetic 
enzymes and other biosynthetic enzymes of Micromonospora megalomicea is 
shown herein merely to illustrate a preferred embodiment of the invention, and the 
invention includes DNA compounds of any sequence that encode the amino acid 
sequences of the polypeptides and proteins of the invention. In similar fashion, a 

25 polypeptide can typically tolerate one or more amino acid substitutions, deletions, 
and insertions in its amino acid sequence without loss or significant loss of a 
desired activity. The present invention includes such polypeptides with alternate 
amino acid sequences, and the amino acid sequences encoded by the DNA 
sequences shown herein merely illustrate preferred embodiments of the invention. 

30 The recombinant nucleic acids, proteins, and peptides of the invention are 

many and diverse. To facilitate an understanding of the invention and the diverse 
compounds and methods provided thereby, the following description of the 
various regions of the megalomicin PKS and the megalomicin modification 
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enzymes and corresponding coding sequences is provided. To facilitate description 
of the invention, reference to a PKS, protein, module, or domain herein can also 
refer to DNA compounds comprising coding sequences therefor and vice versa. 
Also, unless otherwise indicated, reference to a heterologous PKS refers to a PKS 
5 or DNA compounds comprising coding sequences therefor from an organism 
other than Micromonospora megalomicea. In addition, reference to a PKS or its 
coding sequence includes reference to any portion thereof 

Thus, the invention provides DNA molecules in isolated (i.e., not pure, but 
existing in a preparation in an abundance and/or concentration not found in nature) 

1 0 and purified (i.e., substantially free of contaminating materials or substantially free 
of materials with which the corresponding DNA would be found in nature) form. 
The DNA molecules of the invention comprise one or more sequences that encode 
one or more domains (or fragments of such domains) of one or more modules in 
one or more of the ORFs of the megalomicin PKS and sequences that encode 

1 5 megalomicin modification enzymes from the megalomicin biosynthetic gene 
cluster. Examples of PKS domains include the KS, AT, DH, KR, ER, ACP, and 
TE domains of at least one of the 6 extender modules and loading module of the 
three proteins encoded by the three ORFs of the megalomicin PKS gene cluster. 
Examples of megalomicin modification enzymes include those that synthesize the 

20 mycarose, desosamine, and megosamine moieties, those that transfer those sugar 
moieties to the polyketide 6-dEB, those that hydroxylate the polyketide at C-6 and 
C-12, and those that acylate the sugar moieties. 

In an especially preferred embodiment, the DNA molecule is a 
recombinant DNA expression vector or plasmid, as described in more detail in the 

25 following Section. Generally, such vectors can either replicate in the cytoplasm of 
the host cell or integrate into the chromosomal DNA of the host cell. In either 
case, the vector can be a stable vector (i.e., the vector remains present over many 
cell divisions, even if only with selective pressure) or a transient vector (i.e., the 
vector is gradually lost by host cells with increasing numbers of cell divisions). 

30 The megalomicin PKS gene cluster comprises three ORFs (megAl, megAIl, 

and megAIII). Each ORF encodes two extender modules of the PKS; the first ORF 
also encodes the loading module. Each extender module is composed of at least a 
KS, an AT, and an ACP domain. The locations of the various encoding regions of 
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these ORFs are shown in Figure 2 and described with reference to the sequence 
information below. The megalomicin PKS produces the polyketide known as 6- 
dEB, shown in Figure 4. [n megalomicin-producing organisms, 6-dEB is 
converted to erythromycin C by a set of modification enzymes. Thus, 6-dEB is 
5 converted to erythronolide B by the megF gene product (a homolog of the eryF 
gene product), then to 3-alpha-mycarosyl-erythronolide B by the megBV gene 
product (a homolog of the eryBV gene product), then to erythromycin D by the 
megCIII gene product (a homolog of the eryCIII gene product, then to 
erythromycin C by the megKgent product (a homolog of the ery^gene product). 

10 In addition to these modification enzymes, such megalomicin-producing 

organisms also contain the modification enzymes necessary for the biosynthesis of 
the desosamine and mycarose moieties that are similarly utilized in erythromycin 
biosynthesis, as shown in Figure 5. Megalomicin A contains the complete 
erythromycin C structure, and its biosynthesis additionally involves the formation 

1 5 of L-megosamine (L-rhodosamine) and its attachment to the C-6 hydroxyl 
(Figures 3 and 5, inset), followed by acylation of the C-3'" and(or) C-4'" 
hydroxyls as the terminal steps. L-megosamine is the same as N-dimethyl-L- 
daunosamine; the daunosamine genes have been characterized from Streptomyces 
peucetius (see Colombo and Hutchinson, J. Indusi. Microbiol. BiotechnoL, in 

20 press; Otten et al., 1996, J Bacteriol 178:7316-7321, and references cited therein). 
Some of the rhodosamine genes also have been cloned and partially characterized 
from another anthracycline producing Streptomyces sp. (see Torkkell et al,\991, 
Mol Gen. Genet. 25tf(2):203-209). Because the timing of the glycosylation with 
TDP-megosamine in relation to the addition of mycarose and desosamine to 

25 erythronolide B, plus the C-12 hydroxy lation, is unknown, the pathway could 
involve a different order of glycosylation and C-12 hydroxylation steps than the 
one shown in Figure 5. Regardless, the megalomicin biosynthetic gene cluster 
contains the genes to make L-rhodosamine and attach it to the correct macrolide 
substrate. 

30 The biosynthetic pathways to make the glycosides desosamine, mycarose, 

and megosamine are shown in Figure 6. The present invention provides the genes 
for each biosynthetic pathway shown in this Figure, and these recombinant genetic 
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pathways can be used alone or in any combination to confer the pathway to a 
heterologous host. 

The megalomicin PKS locus is similar to the eryA locus in size and 
organization. Most of the deoxysugar biosynthesis genes are homologs of the eryB 
5 mycarose and eryC desosamine biosynthesis and glycosyl attachment genes from 
Saccharopolyspora erythraea (see Summers et ai, 1997, Microbiol. 745:3251- 
3262; Haydock et ai, 1991, Mol Gen. Genet. 230:120-128; Gaisser et al, 1997, 
Mol Gen Genet, 256:239-251 ; Gaisser et al. 9 1998, Mol Gen Genet. 257:78-88, 
incorporated herein by reference) or the picC homologs from the picromycin and 

1 0 narbomycin producer (see PCT patent publication No. 99/61 599 and Xue et al., 
1998, Proc. Nat. Acad. Sci. USA 95, 121 1 1-121 16, incorporated herein by 
reference). The TDP-megosamine biosynthesis genes are homologs of the dnm 
genes (see Figure 5) and the pikromycin N-dimethyltransferase gene or its 
homologs reported in a cluster of L-rhodosamine biosynthesis genes. The putative 

1 5 TDP-megosamine glycosyltransferase gene product {geneX in Figure 5) closely 
resembles the deduced products of the eryBV , eryClll, dnmS, and pikromycin 
des VII genes, even though it recognizes different substrates than the products of 
each of these genes. 

The following Table 1 shows the location of the genes in the 

20 Micromonospora megalomicea megalomicin biosynthetic pathway in the DNA 
sequence set forth in SEQ ID NO: 1 (see also Figure 7; note some gene 
designations maybe different in Figure 7). 

Table 1 . Megalomicin Biosynthetic Gene Cluster 
25 Micromonospora megalomicea subsp. nigra (ATCC27598) 



Location 
1..2451 

complement I.. 144) 
30 2,3-dehydratase 

928..2061 

2072..3382 

homolog) 

2452..40397 
35 3462..4634 

4651. .5775 



Description 

sequence from cosmid pKOS079-138B 
megBVI (or megT), TDP-4-keto-6-deoxyglucose- 

megDVI, TDP-4-keto-6-deoxyglucose 3,4-isomerase 
megDI, TDP-megosaminyl transferase (eryCIII 

sequence of cosmid pKOS079-93D 
megG(or megY), mycarosyl acyltransferase 
megDII, deoxysugar transaminase (eryCI, Dm J 
homolog) 
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5822..659S 
dimethyltransferase 

6592.. 7 197 

5 

7220..8206 
dnmV 

complement(8228..9220) 
10 hexose 2,3-reductase 

complement(9226. . 1 0479) 

complement(10483..1 1424) 

12181..22821 

12181-13791 
15 12505..13470 

13576..13791 

13849.. 18207 

13849..15126 

15427..16476 
20 171 55.. 17694 

17947.. 18207 

1 8268..22575 

18268.. 19548 

19876..20910 
25 21517..22053 

2231 8.-22575 

22867-33555 

22957.-27258 

22957..24237 
30 24544..25581 

26230-26733 

26998-27258 

27313-33312 

27393..28590 
35 28897-29931 

29953..30477 

31396-32244 

32257-32799 

33052..33312 
40 33666..43271 

33780..38120 

33780-35027 

35385..36419 

37068..37604 
45 37860-38120 

38187-42425 

38187-39470 

39795-40811 

40398..46641 



megDIJl, TDP-daunosaminyl-N,N- 
{eryCVI homolog) 

megDIV, TDP-4-keto-6-deoxyglucose 3,5-epimerase 
(eryBVII, dnmU homolog) 
"megDV, TDP-hexose 4-ketoreductase (eryBIV, 

homolog) 

megBIIA or megDV II, TDP-4-keto-L-6-deoxy- 

megBV, TDP-mycarosyl transferase 
megBIV, TDP-hexose 4-ketoreductase 
megAI 

Loading Module (L) 

AT-L 

ACP-L 

Extender Module 1 (1) 

KS1 

ATI 

KR1 

ACPI 

Extender Module 2 (2) 

KS2 

AT2 

KR2 

ACP2 

megAII 

Extender Module 3 (3) 

KS3 

AT3 

KR3 (inactive) 
ACP3 

Extender Module 4 (4) 

KS4 

AT4 

DH4 

ER4 

KR4 

ACP4 

megAII I 

Extender Module 5 (5) 

KS5 

AT5 

KR5 

ACP5 

Extender Module 6 (6) 

KS6 

AT6 

sequences from cosmid pKOS079-93A 
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41406..41936 KR6 

42168..42425 ACP6 

42585..43271 TE 

43268..44S44 megCIJ, TDP-4-keto-6-deoxyglucose 3,4-isomerase 

44355. .45623 megCIII, TDP-desosaminyl transferase 

45620..46591 megBII, TDP-4-keto-6-deoxy-L-glucose 2,3 

dehydratase 

complement(46660..47403) megH, TEII 

complement(474 1 1 . .47980) megF, C-6 hydroxylase 

In a specific embodiment, the invention provides an isolated nucleic acid 
fragment comprising a nucleotide sequence encoding a domain of the 
megalomicin polyketide synthase or a megalomicin modification enzyme. The 
isolated nucleic acid fragment can be a DNA or a RNA. Preferably, the isolated 
nucleic acid fragment is a recombinant DNA compound. A nucleotide sequence 
that is complementary to the nucleotide sequence encoding a domain of 
megalomicin PKS or a megalomicin modification enzyme is also provided. 

The isolated nucleic acid fragment can comprise a single, multiple or all 
the open reading frame(s) (ORF) of the megalomicin PKS or the megalomicin 
modification enzyme. Exemplary ORFs of megalomicin PKS include the ORFs of 
the megAl, megAIl and megAUl genes. The isolated nucleic acids of the invention 
also include nucleic acids that encode one or more domains and one or more 
modules of the megalomicin PKS. Exemplary domains of the megalomicin PKS 
include a TE domain, a KS domain, an AT domain, an ACP domain, a KR 
domain, a DH domain and an ER domain. In a preferred embodiment, the nucleic 
acid comprises the coding sequence for a loading module, a thioesterase domain, 
and all six extender modules of the megalomicin PKS. 

Megalomicin modification enzymes include those enzymes involved in the 
conversion of 6-DEB into a megalomicin such as the enzymes encoded by megF, 
meg BV, megCIIIy megK, megDI and megG (or megY). Megalomicin modification 
enzymes also include those enzymes involved in the biosynthesis of mycarose, 
megosamine or desosamine, which are used as biosynthetic intermediates in the 
biosynthesis of various megalomicin species and other related polyketides. The 
enzymes that are involved in biosynthesis of mycarose, megosamine or 
desosamine are described in Figures 5 and 10. The megalomicin PKS and 
megalomicin modification enzymes are collectively referred to as megalomicin 
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biosynthetic enzymes; the genes encoding such enzymes are collectively referred 
to as megalomicin biosynthetic genes; and nucleic acids that comprise a portion of 
or entire megalomicin biosynthetic genes are collectively referred to as 
megalomicin biosynthetic nucleic acid(s). 
5 In specific embodiments, the megalomicin biosynthetic nucleic acids 

comprise the sequence of SEQ ID NO: 1 , or the coding regions thereof, or 
nucleotide sequences encoding, in whole or in part, a megalomicin biosynthetic 
enzyme protein. The isolated nucleic acids typically consists of at least 25 
(continuous) nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 

10 nucleotides of megalomicin biosynthetic nucleic acid sequence, or a full-length 
megalomicin biosynthetic coding sequence. In another embodiment, the nucleic 
acids are smaller than 35, 200, or 500 nucleotides in length. Nucleic acids can be 
single or double stranded. Nucleic acids that hybridize to or are complementary to 
the foregoing sequences, in particular the inverse complement to nucleic acids that 

1 5 hybridize to the foregoing sequences (/. e. , the inverse complement of a nucleic 
acid strand has the complementary sequence running in reverse orientation to the 
strand so that the inverse complement would hybridize without mismatches to the 
nucleic acid strand) are also provided. In specific aspects, nucleic acids are 
provided which comprise a sequence complementary to (specifically are the 

20 inverse complement of) at least 10, 25, 50, 100, or 200 nucleotides or the entire 
coding region of a megalomicin biosynthetic gene. 

The megalomicin biosynthetic nucleic acids provided herein include those 
with nucleotide sequences encoding substantially the same amino acid sequences 
as found in native megalomicin biosynthetic enzyme proteins, and those encoding 

25 amino acid sequences with functionally equivalent amino acids, as well as 

megalomicin biosynthetic enzyme derivatives or analogs as described in Section 
IV. 

Some regions within the megalomicin PKS genes are highly homologous 
or identical to one another, as can be readily identified by an analysis of the 
30 sequence. The coding sequence for the KS and AT domains of module 2 shares 
significant identity with the coding sequence for the KS and AT domains of 
module .6. This sequence homology or identity at the nucleic acid, e.g., DNA, level 
can render the nucleic acid unstable in certain host cells. To improve the stability 
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of the nucleic acids comprising a portion or the entire megalomicin PKS genes and 
megalomicin modification enzyme genes, the nucleic acid or DNA sequences can 
be changed to reduce or abolish the sequence homology or identity. Preferably, 
the DNA codons of homologous regions within the PKS or the megalomicin 
5 modification enzyme coding sequence are changed to reduce or abolish the 
sequence homology or identity without changing the amino acid sequences 
encoded by said changed DNA codons (see the examples below). The stability of 
the nucleic acid or DNA can also be improved by codon changes that reduce or 
abolish the sequence homology or identity while also changing the amino acid 
sequence, provided that the amino acid sequence change(s) does not substantially 
change the desired activity of the encoded megalomicin PKS. Thus, for example, 
one can simply substitute for the megAIIl ORF an ORF from eryAIH, oleAIII, 
picAIII, or picAlV genes. 

The recombinant DNA compounds of the invention that encode the 
megalomicin PKS and modification proteins or portions thereof are useful in a 
variety of applications. While many of these applications relate to the heterologous 
expression of the megalomicin biosynthetic genes or the construction of hybrid 
PKS enzymes, many useful applications involve the natural megalomicin producer 
Micromonospora megalomicea. For example, one can use the recombinant DNA 
compounds of the invention to disrupt the megalomicin biosynthetic genes by 
homologous recombination in Micromonospora megalomicea. The resulting host 
cell is a preferred host cell for making polyketides modified by oxidation, 
hydroxylation, glycosylation, and acylation in a manner similar to megalomicin, 
because the genes that encode the proteins that perform these reactions are of 
course present in the host cell, and because the host cell does not produce 
megalomicin that could interfere with production or purification of the polyketide 
of interest. 

One illustrative recombinant host cell provided by the present invention 
expresses a recombinant megalomicin PKS in which the module 1 KS domain is 
inactivated by deletion or other mutation. In a preferred embodiment, the 
inactivation is mediated by a change in the KS domain that renders it incapable of 
binding substrate (called a KS 1 ° mutation). In a particularly preferred 
embodiment, this inactivation is rendered by a mutation in the codon for the active 

26 
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site cysteine that changes the codon to another codon, such as an alanine codon. 
Such constructs are especially useful when placed in translation^ reading frame 
with extender modules 1 and 2 of a megalomicin or the corresponding modules of 
another PKS. The utility of these constructs is that host cells expressing, or cell 
5 free extracts containing, a PKS comprising the protein encoded thereby can be fed 
or supplied with N-acylcysteamine thioesters of precursor molecules to prepare a 
polyketide of interest. See U.S. patent application Serial No. 09/492,773, filed 27 
Jan. 2000, and PCT patent publication No. 00/44717, both of which are 
incorporated herein by reference. Such KS1° constructs of the invention are useful 

10 in the production of 1 3-substituted-megalomicin compounds in Micromonospora 
megalomicea host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. 

In a variant of this embodiment, one can employ a megalomicin PKS in 

1 5 which the ACP domain of module 1 has been rendered inactive. In another 
embodiment, one can delete the loading domain of the megalomicin PKS and 
provide monoketide substrates for processing by the remainder of the PKS. 

The compounds of the invention can also be used to construct recombinant 
host cells of the invention in which coding sequences for one or more domains or 

20 modules of the megalomicin PKS or for another megalomicin biosynthetic gene 
have been deleted by homologous recombination with the Micromonospora 
megalomicea chromosomal DNA. Those of skill in the art will appreciate that the 
compounds used in the recombination process are characterized by their homology 
with the chromosomal DNA and not by encoding a functional protein due to their 

25 intended function of deleting or otherwise altering portions of chromosomal DNA. 
For this and a variety of other applications, the compounds of the present 
invention include not only those DNA compounds that encode functional proteins 
but also those DNA compounds that are complementary or identical to any portion 
of the megalomicin biosynthetic genes. 

30 Thus, the invention provides a variety of modified Micromonospora 

megalomicea host cells in which one or more of the megalomicin biosynthetic 
genes have been mutated or disrupted. Transformation systems for M. 
megalomicea have been described by Hasegawa et ai y 1991 , J. Bacteriol 



WO 01/27284 



PCT/US00/27433 



/ 75:7004-1 1; and Takada el ai, 1 994, J. Anlibiot. 47:\ 167-1 170, both of which 
are incorporated herein by reference. These cells are especially useful when it is 
desired to replace the disrupted function with a gene product expressed by a 
recombinant DNA expression vector. While such expression vectors of the 
5 invention are described in more detail in the following Section, those of skill in 
the art will appreciate that the vectors have application to M. megalomicea as well. 
Such Af. megalomicea host cells can be preferred host cells for expressing 
megalomicin derivatives of the invention. Particularly preferred host cells of this 
type include those in which the coding sequence for the loading module has been 
10 mutated or disrupted, those in which one or more of any of the PKS gene ORFs 
has been mutated or disrupted, and/or those in which the genes for one or more 
modification (glycosylation, acylation, hydroxylation) have been mutated or 
disrupted. 

While the present invention provides many useful compounds having 
1 5 application to, and recombinant host cells derived from, Micromonospora 

megalomicea, many important applications of the present invention relate to the 
heterologous expression of all or a portion of the megalomicin biosynthetic genes 
in cells other than M megalomicea, as described in Section V. 

20 Section IV: The Megalomicin Biosynthetic Enzymes and Antibodies Recognizing 
such Enzymes 

In another specific embodiment, the invention provides a substantially 
purified polypeptide, which is encoded by a nucleic acid fragment comprising a 
nucleotide sequence encoding a domain of megalomicin polyketide synthase 

25 (PKS) or a megalomicin modification enzyme. The polypeptide can comprise a 
single domain, multiple domains or a full-length megalomicin PKS or 
megalomicin modification enzyme. Functional fragments, analogs or derivatives 
of the megalomicin PKS or megalomicin modification enzyme polypeptides are 
also provided. Preferably, such fragments, analogs or derivatives can be 

30 recognized an antibody raised against a megalomicin PKS or megalomicin 

modification enzyme. Also preferably, such fragments, analogs or derivatives 
comprise an amino acid sequence that has at least 60% identity, more preferably at 
least 90% identity to their wild type counterparts. 



WO 01/27284 



PCT/US00/27433 



An exemplary nucleotide sequence encoding, and the corresponding amino 
acid sequence of, a megalomicin biosynthetic enzyme is disclosed in SEQ ID 
NO: 1 . Homologs (e.g., nucleic acids of the above-listed genes of species other 
than Micromonospora megalomicea) or other related sequences (e.g., paralogs) 
5 can be obtained by low, moderate or high stringency hybridization with all or a 
portion of the particular sequence provided as a probe using methods well known 
in the art for nucleic acid hybridization and cloning (e.g., as described in Section 
III) in accordance with the methods of the present invention. 

The megalomicin biosynthetic enzyme proteins, or domains thereof, of the 

10 present invention can be obtained by methods well known in the art for protein 
purification and recombinant protein expression in accordance with the methods 
of the present invention. For recombinant expression of one or more of the 
proteins, the nucleic acid containing all or a portion of the nucleotide sequence 
encoding the protein can be inserted into an appropriate expression vector, i.e., a 

1 5 vector that contains the necessary elements for the transcription and translation of 
the inserted protein coding sequence. Transcriptional and translational signals can 
be supplied by the native promoter for a megalomicin biosynthetic gene and/or 
flanking regions. 

A variety of host- vector systems may be utilized to express the protein 
20 coding sequence. These include but are not limited to mammalian cell systems 
infected with virus (e.g. vaccinia virus, adenovirus, and the like); insect cell 
systems infected with virus (e.g. baculovirus); microorganisms such as yeast 
containing yeast vectors; or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, orcosmid DNA. The expression elements of vectors vary in their 
25 properties. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. 

In a specific embodiment, a vector is used that comprises a promoter 
operably linked to nucleic acid sequences encoding a megalomicin biosynthetic 
enzyme, or a domain, fragment, derivative or homolog, thereof, one or more 
30 origins of replication, and optionally, one or more selectable markers (e.g. , an 
antibiotic resistance gene). 

Expression vectors containing the sequences of interest can be identified 
by three general approaches: (a) nucleic acid hybridization, (b) presence or 
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absence of "marker" gene function, and (c) expression of the inserted sequences. 
In the first approach, megalomicin biosynthetic nucleic acid sequences can be 
detected by nucleic acid hybridization to probes comprising sequences 
homologous and complementary to the inserted sequences. In the second 
5 approach, the recombinant vector/host system can be identified and selected based 
upon the presence or absence of certain "marker" functions (e.g., binding to an 
anti-megalomicin biosynthetic enzyme antibody, resistance to antibiotics, 
occlusion body formation in baculovirus, and the like) caused by insertion of the 
sequences of interest in the vector. For example, if a megalomicin biosynthetic 

10 gene, or portion thereof, is inserted within the marker gene sequence of the vector, 
recombinants containing the megalomicin biosynthetic gene fragment will be 
identified by the absence of the marker gene function. In the third approach, 
recombinant expression vectors can be identified by assaying for the megalomicin 
biosynthetic gene products expressed by the recombinant vector. Such assays can 

1 5 be based, for example, on the physical or functional properties of the interacting 
species in in vitro assay systems, e.g., megalomicin synthesis activity, 
immunoreactivity to antibodies specific for the protein. 

Once recombinant megalomicin biosynthetic genes or nucleic acids are 
identified, several methods known in the art can be used to propagate them in 

20 accordance with the methods of the present invention. Once a suitable host 
system and growth conditions have been established, recombinant expression 
vectors can be propagated and amplified in quantity. As previously described, the 
expression vectors or derivatives which can be used include, but are not limited to: 
human or animal viruses such as vaccinia virus or adenovirus; insect viruses such 

25 as baculovirus, yeast vectors; bacteriophage vectors such as lambda phage; and 
plasmid and cosmid vectors. 

In addition, a host cell strain may be chosen that modulates the expression 
of the inserted sequences, or modifies or processes the expressed proteins in the 
specific fashion desired. Expression from certain promoters can be elevated in the 

30 presence of certain inducers; thus expression of the genetically-engineered 

megalomicin biosynthetic enzymes may be controlled. Furthermore, different host 
cells have characteristic and specific mechanisms for the translational and post- 
radiational processing and modification (e.g. glycosylation, phosphorylation, and 
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the like) of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the desired modification and processing of the foreign protein is achieved. 
For example, expression in a bacterial system can be used to produce an 
unglycosylated core protein, while expression in mammalian cells ensures 

5 "native" glycosylation of a heterologous protein. Furthermore, different 

vector/host expression systems may effect processing reactions to different extent. 

In particular, megalomicin biosynthetic enzyme derivatives can be made by 
altering their sequences by substitutions, additions or deletions that provide for 
functionally equivalent molecules. Due to the degeneracy of nucleotide coding 

0 sequences, other DNA sequences which encode substantially the same amino acid 
sequence as an megalomicin biosynthetic gene can be used in the practice of the 
present invention. These include but are not limited to nucleotide sequences 
comprising all or portions of megalomicin biosynthetic genes that are altered by 
the substitution of different codons that encode the amino acid residue within the 

5 sequence, thus producing a silent change. Likewise, the megalomicin biosynthetic 
enzyme derivatives of the invention include, but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid 
sequence of megalomicin biosynthetic enzymes, including altered sequences in 
which functionally equivalent amino acid residues are substituted for residues 

0 within the sequence resulting in a silent change. For example, one or more amino 
acid residues within the sequence can be substituted by another amino acid of a 
similar polarity which acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence may be selected 
from other members of the class to which the amino acid belongs. For example, 

5 the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, 
valine, proline, phenylalanine, tryptophan and methionine. The polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and 

0 glutamic acid. 

In a specific embodiment of the invention, the nucleic acids encoding 
proteins and proteins consisting of or comprising a domain or a fragment of 
megalomicin biosynthetic enzyme consisting of at least 6 (continuous) amino 

31 
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acids are provided. In other embodiments, the domain or fragment consists of at 
least 10, 20, 30, 40, or 50 amino acids of a megalomicin biosynthetic enzyme. In 
specific embodiments, such domains or fragments are not larger than 35, 100 or 
200 amino acids. Derivatives .or analogs of megalomicin biosynthetic enzyme 
5 include but are not limited to molecules comprising regions that are substantially 
homologous to megalomicin biosynthetic enzyme in various embodiments, at least 
30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% identity over an amino acid 
sequence of identical size or when compared to an aligned sequence in which the 
alignment is done by a computer homology program known in the art in 

1 0 accordance with the methods of the present invention or whose encoding nucleic 
acid is capable of hybridizing to a sequence encoding a megalomicin biosynthetic 
enzyme under stringent, moderately stringent, or nonstringent conditions. 

The megalomicin biosynthetic enzyme domains, derivatives and analogs of 
the invention can be produced by various methods known in the art in accordance 

1 5 with the methods of the present invention. The manipulations which result in their 
production can occur at the gene or protein level. For example, the cloned 
megalomicin biosynthetic gene sequence can be modified by any of numerous 
strategies known in the art (Sambrook et al., 1990, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, 

20 New York) in accordance with the methods of the present invention. The 

sequences can be cleaved at appropriate sites with restriction endonuclease(s), 
followed by further enzymatic modification if desired, isolated, and ligated in 
vitro. 

Additionally, the megalomicin biosynthetic enzyme-encoding nucleotide 
25 sequence can be mutated in vitro or in vivo, to create and/or destroy translation, 
initiation, and/or termination sequences, or to create variations in coding regions 
and/or form new restriction endonuclease sites or destroy pre-existing ones, to 
facilitate further in vitro modification. Any technique for mutagenesis known in 
the art can be used in accordance with the methods of the present invention, 
30 including but not limited to, chemical mutagenesis and in vitro site-directed 
mutagenesis (Hutchinson et al., J. Biol Chem. 253:6551-6558 (1978)), use of 
TAB® linkers (Pharmacia), and the like. 
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Once a recombinant cell expressing a megalomicin biosynthetic enzyme 
protein, or a domain, fragment or derivative thereof, is identified, the individual 
gene product can be isolated and analyzed. This is achieved by assays based on 
the physical and/or functional properties of the protein, including, but not limited 
5 to, radioactive labeling of the product followed by analysis by gel electrophoresis, 
immunoassay, cross-linking to marker-labeled product, and the like. 

The megalomicin biosynthetic enzyme proteins may be isolated and 
purified by standard methods known in the art or recombinant host cells 
expressing the complexes or proteins in accordance with the methods of the 

10 invention, including but not restricted to column chromatography (e.g., ion 

exchange, affinity, gel exclusion, reversed-phase high pressure, fast protein liquid, 
and the like), differential centrifiigation, differential solubility, or by any other 
standard technique used for the purification of proteins. Functional properties 
may be evaluated using any suitable assay known in the art in accordance with the 

15 methods of the present invention. 

Alternatively, once a megalomicin biosynthetic enzyme or its domain or 
derivative is identified, the amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the gene which encodes it. As a result, the 
protein or its domain or derivative can be synthesized by standard chemical 

20 methods known in the art in accordance with the methods of the present invention 
(see Hunkapiller et al, Nature 310:105-1 1 1 (1984)). 

Manipulations of megalomicin biosynthetic enzymes may be made at the 
protein level. Included within the scope of the invention are megalomicin 
biosynthetic enzyme domains, derivatives or analogs or fragments, which are 

25 differentially modified during or after translation, e.g., by glycosylation, 
acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule 
or other cellular ligand, and the like. Any of numerous chemical modifications 
may be carried out by known techniques, including but not limited to specific 

30 chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 
protease, NaBI-Lt, acetylation, formylation, oxidation, reduction, metabolic 
synthesis in the presence of tunicamycin, and the like. 
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In specific embodiments, the megalomicin biosynthetic enzymes are 
modified to include a fluorescent label. In other specific embodiments, the 
megalomicin biosynthetic enzyme is modified to have a heterofiinctional reagent, 
such heterofunctional reagents can be used to crosslink the members of the 
5 complex. 

In addition, domains, analogs and derivatives of a megalomicin 
biosynthetic enzyme can be chemically synthesized. For example, a peptide 
corresponding to a portion of a megalomicin biosynthetic enzyme, which 
comprises the desired domain or which mediates the desired activity in vitro can 

10 be synthesized by use of a peptide synthesizer. Furthermore, if desired, 

nonclassical amino acids or chemical amino acid analogs can be introduced as a 
substitution or addition into the megalomicin biosynthetic enzyme sequence. 
Non-classical amino acids include but are not limited to the D-isomers of the 
common amino acids, alpha-amino isobutyric acid, 4-aminobutyric acid, 

1 5 2-aminobutyric acid, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 
3-amino propionoic acid, ornithine, norleucine, norvaline, hydroxyproline, 
sarcosine, citrulline, cysteic acid, t-butylglycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, B-alanine, fluoro-amino acids, designer amino acids such as B- 
methyl amino acids, Ca-methyl amino acids, Na-methyl amino acids, and amino 

20 . acid analogs in general. Furthermore, the amino acid can be D (dextrorotary) or L 
(levorotary). 

In cases where natural products are suspected of being mutant or are 
isolated from new species, the amino acid sequence of the megalomicin 
biosynthetic enzyme isolated from the natural source, as well as those expressed in 

25 vitro, or from synthesized expression vectors in vivo or in vitro, can be determined 
from analysis of the DNA sequence, or alternatively, by direct sequencing of the 
isolated protein. Such analysis may be performed by manual sequencing or 
through use of an automated amino acid sequenator. 

The megalomicin biosynthetic enzyme proteins may also be analyzed by 

30 hydrophilicity analysis (Hopp and Woods, Proc. Natl. Acad Sci. USA 78:3824- 
3828 (1981)). A hydrophilicity profile can be used to identify the hydrophobic 
and hydrophilic regions of the proteins, and help predict their orientation in 
designing substrates for experimental manipulation, such as in binding 
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experiments, antibody synthesis, and the like. Secondary structural analysis can 

also be done to identify regions of the megalomicin biosynthetic enzyme that 

assume specific structures (Chou and Fasman, Biochemistry .13:222-23 (1974)). 

Manipulation, translation, secondary structure prediction, hydrophilicity and 
5 hydrophobicity profiles, open reading frame prediction and plotting, and 

determination of sequence homologies, can be accomplished using computer 

software programs available in the art. 

Other methods of structural analysis including but not limited to X-ray 

crystallography (Engstrom, Biochem. Exp, Biol.XV.IAZ (1974)), mass 
10 spectroscopy and gas chromatography (Methods in Protein Science, J. Wiley and 

Sons, New York, 1997), and computer modeling (Fletterick and Zoller, eds., 1986, 

Computer Graphics and Molecular Modeling, In; Current Communications in 

Molecular Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, 

New York) can also be employed. 
1 5 The invention also provides an antibody, or a fragment or derivative 

thereof, which immuno-speciflcally binds to a domain of megalomicin polyketide 

synthase (PKS) or a megalomicin modification enzyme. In a specific 

embodiment, an antibody which immuno-specifically binds to a domain of the 

megalomicin biosynthetic enzyme encoded by a nucleic acid that hybridizes to a 
20 nucleic acid having the nucleotide sequence set forth in the SEQ. ID NO: I, or a 

fragment or derivative of said antibody containing the binding domain thereof is 

provided. Preferably, the antibody is a monoclonal antibody. 

The megalomicin biosynthetic enzyme protein and domains, fragments, 

homologs and derivatives thereof may be used as immunogens to generate 
25 antibodies which immunospecifically bind such immunogens. Such antibodies 

include but are not limited to polyclonal, monoclonal, chimeric, single chain, Fab 

fragments, and an Fab expression library. 

Various procedures known in the art may be used for the production of 

polyclonal antibodies to a megalomicin biosynthetic enzyme protein of the 
30 invention, its domains, derivatives, fragments or analogs in accordance with the 

methods of the present invention. 

For production of the antibody, various host animals can be immunized by 

injection with the native megalomicin biosynthetic enzyme protein or a synthetic 
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version, or a derivative of the foregoing, such as a cross-linked megalomicin 
biosynthetic enzyme. Such host animals include but are not limited to rabbits, 
mice, rats, and the like. Various adjuvants can be used to increase the 
immunological response, depending on the host species, and include but are not 
5 limited to Freund's (complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, and potentially useful human 
adjuvants such as bacille Calmette-Guerin (BCG) and corynebacterium parvum. 
For preparation of monoclonal antibodies directed towards a megalomicin 

10 biosynthetic enzyme or domains, derivatives, fragments or analogs thereof, any 
technique that provides for the production of antibody molecules by continuous 
cell lines in culture may be used. Such techniques include but are not restricted to 
the hybridoma technique originally developed by Kohler and Milstein {Nature 
256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique 

1 5 (Kozbor et al., Immunology Today 4:72 (1 983)), and the EBV hybridoma 

technique to produce human monoclonal antibodies (Cole et al., in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 (1985)). In an 
additional embodiment, monoclonal antibodies can be produced in germ-free 
animals (WO89/12690). Human antibodies may be used and can be obtained by 

20 using human hybridomas (Cote et al., Proc. Natl Acad. Set USA 80:2026-2030 
(1983)) or by transforming human B cells with EBV virus in vitro (Cole et al., in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 
(1985)). Techniques developed for the production of "chimeric antibodies" 
(Morrison et al., Proc. Natl. Acad Sci. USA 8 1 :685 1 -6855 ( 1 984); Neuberger et 

25 al, Nature 3_l2:604-608 (1984); Takeda et al., Nature 314:452-454 (1985)) by 
splicing the genes from a mouse antibody molecule specific for the megalomicin 
biosynthetic enzyme protein together with genes from a human antibody molecule 
of appropriate biological activity can be used; such antibodies are within the scope 
of this invention. 

30 Techniques described for the production of single chain antibodies (U.S. 

patent 4,946,778) can be adapted to produce megalomicin biosynthetic enzyme- 
specific single chain antibodies. An additional embodiment utilizes the techniques 
described for the construction of Fab expression libraries (Huse et al., Science 
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246:1275-1281 (1989)) to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity for megalomicin biosynthetic enzyme, or 
domains, derivatives, or analogs thereof. Non-human antibodies can be 
"humanized" by known methods (see, e.g., U.S. Patent No. 5,225,539). 
5 Antibody fragments that contain the idiotypes of a megalomicin 

biosynthetic enzyme can be generated by techniques known in the art in 
accordance with the methods of the present invention. For example, such 
fragments include but are not limited to: the F(ab')2 fragment which can be 
produced by pepsin digestion of the antibody molecule; the Fab' fragments that 

10 can be generated by reducing the disulfide bridges of the F(ab')2 fragment, the Fab 
fragments that can be generated by treating the antibody molecular with papain 
and a reducing agent, and Fv fragments. 

In the production of antibodies, screening for the desired antibody can be 
accomplished by techniques known in the art in accordance with the methods of 

1 5 the present invention, e.g. , ELIS A (enzyme-linked immunosorbent assay). To 
select antibodies specific to a particular domain of the megalomicin biosynthetic 
enzyme, one may assay generated hybridomas for a product that binds to the 
fragment of a megalomicin biosynthetic enzyme that contains such a domain. 

The foregoing antibodies can be used in methods known in the art relating 

20 to the localization and/or quantitation of megalomicin biosynthetic enzyme 

proteins, e.g., for imaging these proteins or measuring levels thereof in samples, in 
accordance with the methods of the present invention. 

Section V: Heterologous Expression of the Megalomicin Biosynthetic Genes 
25 In one important embodiment, the invention provides methods for the 

heterologous expression of one or more of the megalomicin biosynthetic genes 
and recombinant DNA expression vectors useful in the method. For purposes of 
the invention, any host cell other than Micromonospora megalomicea is a 
heterologous host cell. Thus, included within the scope of the invention in 
30 addition to isolated nucleic acids encoding domains, modules, or proteins of the 
megalomicin PKS and modification enzymes, are recombinant expression vectors 
that include such nucleic acids. The term expression vector refers to a nucleic acid 
that can be introduced into a host cell or cell-free transcription and translation 
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system. An expression vector can be maintained permanently or transiently in a 
cell, whether as part of the chromosomal or other DNA in the cell or in any 
cellular compartment, such as a replicating vector in the cytoplasm. An expression 
vector also comprises a promoter that drives expression of an RNA, which 
5 typically is translated into a polypeptide in the cell or cell extract. For efficient 
translation of RNA into protein, the expression vector also typically contains a 
ribosome-binding site sequence positioned upstream of the start codon of the 
coding sequence of the gene to be expressed. Other elements, such as enhancers, 
secretion signal sequences, transcription termination sequences, and one or more 

1 0 marker genes by which host cells containing the vector can be identified and/or 
selected, may also be present in an expression vector. Selectable markers, i.e., 
genes that confer antibiotic resistance or sensitivity, are preferred and confer a 
selectable phenotype on transformed cells when the cells are grown in an 
appropriate selective medium. 

1 5 The various components of an expression vector can vary widely, 

depending on the intended use of the vector and the host cell(s) in which the 
vector is intended to replicate or drive expression. Expression vector components 
suitable for the expression of genes and maintenance of vectors in E. coli, yeast, 
Streptomyces, and other commonly used cells are widely known and commercially 

20 available. For example, suitable promoters for inclusion in the expression vectors 
of the invention include those that function in eucaryotic or procaryotic host cells. 
Promoters can comprise regulatory sequences that allow for regulation of 
expression relative to the growth of the host cell or that cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus. For E. 

25 coli and certain other bacterial host cells, promoters derived from genes for 
biosynthetic enzymes, antibiotic-resistance conferring enzymes, and phage 
proteins can be used and include, for example, the galactose, lactose (lac), 
maltose, tryptophan (trp), beta-lactamase (bid), bacteriophage lambda PL, and T5 
promoters. In addition, synthetic promoters, such as the lac promoter (U.S. Patent 

30 No. 4,55 1 ,433), can also be used. 

Thus, recombinant expression vectors contain at least one expression 
system, which, in turn, is composed of at least a portion of the megalomicin PKS 
and/or other megalomicin biosynthetic gene coding sequences operably linked to a 
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promoter and optionally termination sequences that operate to effect expression of 
the coding sequence in compatible host cells. The host cells are modified by 
transformation with the recombinant DNA expression vectors of the invention to 
contain the expression system sequences either as extrachromosomal elements or 
5 integrated into the chromosome. The resulting host cells of the invention are 

useful in methods to produce PKS and post-PKS modification enzymes as well as 
polyketides and antibiotics and other useful compounds derived therefrom. 

Preferred host cells for purposes of selecting vector components for 
expression vectors of the present invention include fungal host cells such as yeast 

10 and procaryotic host cells such as E. coli and Streptomyces, but mammalian host 
cells can also be used. In hosts such as yeasts, plants, or mammalian cells that 
ordinarily do not produce polyketides, it may be necessary to provide, also 
typically by recombinant means, suitable holo-ACP synthases to convert the 
recombinantly produced PKS to functionality. Provision of such enzymes is 

1 5 described, for example, in PCT publication Nos. WO 97/1 3845 and 98/27203, 
each of which is incorporated herein by reference. Particularly preferred host cells 
for purposes of the present invention are Streptomyces and Saccharopolyspora 
host cells, as discussed in greater detail below. 

In a preferred embodiment, the expression vectors of the invention are 

20 used to construct a heterologous recombinant Streptomyces host cell that expresses 
a recombinant PKS of the invention. Streptomyces is a convenient host for 
expressing polyketides, because polyketides are naturally produced in certain 
Streptomyces species, and Streptomyces cells generally produce the precursors 
needed to form the desired polyketide. Those of skill in the art will recognize that, 

25 if a Streptomyces host cell produces any portion of a PKS enzyme or produces a 
polyketide modification enzyme, the recombinant vector need drive expression of 
only those genes constituting the remainder of the desired PKS enzyme or other 
polyketide-modifying enzymes. Thus, such a vector may comprise only a single 
ORF, with the desired remainder of the polypeptides constituting the PKS 

30 provided by the genes on the host cell chromosomal DNA. 

If a Streptomyces or other host cell ordinarily produces polyketides, it may 
be desirable to modify the host so as to prevent the production of endogenous 
polyketides prior to its use to express a recombinant PKS of the invention. Such 
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modified hosts include S. coelicolor CH999 and similarly modified S. lividans 
described in U.S. Patent No. 5,672,491, and PCT publication Nos. WO 95/08548 
and WO 96/40968, incorporated herein by reference. In such hosts, it may not be 
necessary to provide enzymatic activities for all of the desired post-translational 
5 modifications of the enzymes that make up the recombinantly produced PKS, 
because the host naturally expresses such enzymes. In particular, these hosts 
generally contain holo-ACP synthases that provide the phosphopantotheinyl 
residue needed for functionality of the PKS. 

The invention provides a wide variety of expression vectors for use in 

10 Streptomyces. The replicating expression vectors of the present invention include, 
for example and without limitation, those that comprise an origin of replication 
from a low copy number vector, such as SCP2* (see Hopwood et aL, Genetic 
Manipulation of Streptomyces: A Laboratory manual (The John Innes Foundation, 
Norwich, U.K., 1985); Lydiate et aL, 1985, Gene 35: 223-235; and Kieser and 

15 Melton, 1988, Gene 65: 83-91 , each of which is incorporated herein by reference), 
SLP1.2 (Thompson et aL, 1982, Gene 20: 51-62, incorporated herein by 
reference), and pSG5(ts) (Muth et aL, 1989, MoL Gen. Genet 219: 341-348, and 
Bierman et aL, 1992, Gene 116: 43-49, each of which is incorporated herein by 
reference), or a high copy number vector, such as pIJlOl and pJVl (see Katz et 

20 aL, 1983, J. Gen. Microbiol. 129: 2703-2714; Vara et aL, 1989, 1 BacterioL 171: 
5782-5781; and Servin-Gonzalez, 1993, Plasmid 30: 131-140, each of which is 
incorporated herein by reference). For non-replicating and integrating vectors and 
generally for any vector, it is useful to include at least an E. coli origin of 
replication, such as from pUC, plP, pi I, and pBR. For phage based vectors, the 

25 phage phiC3 1 and its derivative KC515 can be employed (see Hopwood et aL, 
supra). Also, plasmid pSET152, plasmid pSAM, plasmids pSElOl and pSE21 1, 
all of which integrate site-specifically in the chromosomal DNA of S. lividans, can 
be employed for purposes of the present invention. 

The Streptomyces recombinant expression vectors of the invention 

30 typically comprise one or more selectable markers, including antibiotic resistance 
conferring genes selected from the group consisting of the ermE (confers 
resistance to erythromycin and lincomycin), tsr (confers resistance to 
thiostrepton), aadA (confers resistance to spectinomycin and streptomycin), aacC4 
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(confers resistance to apramycin, kanamycin, gentamicin, geneticin (G418), and 
neomycin), hyg (confers resistance to hygromycin), and vph (confers resistance to 
viomycin) resistance conferring genes. Alternatively, several polyketides are 
naturally colored, and this characteristic can provide a built-in marker for 
5 identifying cells. 

Megalomicins are currently produced only by the relatively genetically 
intractable host Micromonospora megalomicinea. This bacteria has not been 
commonly used in the fermentation industry for the large-scale production of 
antibiotics, and methods for high level production of megalomicin and its analogs 

10 are needed. In contrast, the streptomycete bacteria have been widely used for 
almost 50 years and are excellent hosts for production of megalomicin and its 
analogs. Streptomyces lividans and S. coelicolor have been developed for the 
expression of heterologous PKS systems. These organisms can stably maintain 
cloned heterologous PKS genes, express them at high levels under controlled 

15 conditions, and modify the corresponding PKS proteins (e.g., 

phosphopantotheinylation) so that they are capable of production of the polyketide 
they encode. Furthermore, these hosts contain the necessary pathways to produce 
the substrates required for polyketide synthesis; e.g. propionyl-CoA and 
methylmalonyl-CoA. A wide variety of cloning and expression vectors are 

20 available for these hosts, as are methods for the introduction and stable 

maintenance of large segments of foreign DNA. Relative to Micromonospora spp., 
S. lividans and S. coelicolor grow well on a number of media and have been 
adapted for high level production of polyketides in fermentors. If production levels 
are low, a number of rational approaches are available to improve yield (see 

25 Hosted and Baltz, 1996, Trends BiotechnoL 7¥(7):245-50, incorporated herein by 
reference). Empirical methods to increase the titers of these macrolides, long since 
proven effective for numerous bacterial polyketides, can also be employed. 

Preferred Streptomyces host cell/vector combinations of the invention 
include S. coelicolor CH999 and S. lividans K4-1 14 host cells, which have been 

30 modified so as not to produce the polyketide actinorhodin, and expression vectors 
derived from the pRMl and pRM5 vectors, as described in U.S. Patent Nos. 
5,830,750 and 6,022,731 and U.S. patent application Serial No. 09/181,833, filed 
28 Oct. 1998, each of which is incorporated herein by reference. These vectors are 
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particularly preferred in that they contain promoters compatible with numerous 
and diverse Streptomyces spp. Particularly useful promoters for Streptomyces host 
cells include those from PKS gene clusters that result in the production of 
polyketides as secondary metabolites, including promoters from aromatic (Type II) 
5 PKS gene clusters. Examples of Type II PKS gene cluster promoters are act gene 
promoters and tern gene promoters; an example of a Type I PKS gene cluster 
promoter are the promoters of the spiramycin PKS genes and DEBS genes. The 
present invention also provides the megalomicin biosynthetic gene promoters in 
recombinant form. These promoters can be used to drive expression of the 

1 0 megalomicin biosynthetic genes or any other coding sequence of interest in host 
cells in which the promoter functions, particularly Micromonospora megalomicea 
and generally any Streptomyces species. 

As described above, particularly useful control sequences are those that 
alone or together with suitable regulatory systems activate expression during 

1 5 transition from growth to stationary phase in the vegetative mycelium. The 
promoter contained in the aforementioned plasmid pRM5 s i.e., the actl/actlll 
promoter pair and the actII-ORF4 activator gene, is particularly preferred. Other 
useful Streptomyces promoters include without limitation those from the ermE 
gene and the melCl gene, which act constitutively, and the tipA gene and the merA 

20 gene, which can be induced at any growth stage. In addition, the T7 RNA 

polymerase system has been transferred to Streptomyces and can be employed in 
the vectors and host cells of the invention. In this system, the coding sequence for 
the T7 RNA polymerase is inserted into a neutral site of the chromosome or in a 
vector under the control of the inducible merA promoter, and the gene of interest is 

25 placed under the control of the T7 promoter. As noted above, one or more 
activator genes can also be employed to enhance the activity of a promoter. 
Activator genes in addition to the actII-ORF4 gene described above include dnrl, 
reclD, and ptpA genes (see U.S. patent application Serial No. 09/181,833, supra). 
To provide a preferred host cell and vector for purposes of the invention, 

30 the megalomicin biosynthetic genes are placed on a recombinant expression vector 
and transferred to the non-macrolide producing hosts Streptomyces lividans K4- 
1 14 and S. coelicolor CH999. Transformation of S lividans K4-1 14 or S 
coelicolor CH999 with this expression vector results in a strain which produces 
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detectable amounts of megalomicin as determined by analysis of extracts by 
LC/MS. As noted above, the present invention also provides recombinant DNA 
compounds in which the encoded megalomicin module 1 KS domain is 
inactivated (the KS1° mutation). The introduction into Streptomyces lividans or 5. 
5 coelicolor of a recombinant expression vector of the invention that encodes a 
megalomicin PKS with a KSl° domain produces a host cell useful for making 
polyketides by a process known as diketide feeding. The resulting host cells can be 
fed or supplied with N-acylcysteamine thioesters of precursor molecules to 
prepare megalomicin derivatives. Such cells of the invention are especially useful 

1 0 in the production of 1 3-substituted-6-deoxyerythronolide B compounds in 
recombinant host cells. Preferred compounds of the invention include those 
compounds in which the substituent at the 13-position is propyl, vinyl, propargyl, 
other lower alkyl, and substituted alkyl. In a preferred embodiment, the meg PKS 
is produced from a recombinant construct in which the megAIII gene has been 

1 5 altered to abolish the regions of identical coding sequence it otherwise shares with 
the megAI gene, or a hybrid PKS is employed in which the megAIII gene product 
has been replaced by the oleAIII gene product. Recombinant oleAIII genes are 
described in, for example, PCT patent publication No. 00/026349 and U.S. patent 
application Serial No. 09/428,517, filed 28 Oct. 1999, both of which are 

20 incorporated herein by reference. 

The recombinant host cells of the invention can express all of the 
megalomicin biosynthetic genes or only a subset of the same. For example, if only 
the genes for the megalomicin PKS are expressed in a host cell that otherwise does 
not produce polyketide modifying enzymes that can act on the polyketide 

25 produced, then the host cell produces unmodified polyketides, called macrolide 
aglycones. Such macrolide aglycones can be hydroxylated and glycosylated by 
adding them to the fermentation of a strain such as, for example, Streptomyces 
antibioticus or Saccharopolyspora erythraea, that contains the requisite 
modification enzymes. 

30 There are a wide variety of diverse organisms that can modify macrolide 

aglycones to provide compounds with, or that can be readily modified to have, 
useful activities. For example, as shown in Figure 5, Saccharopolyspora erythraea 
can convert 6-dEB to a variety of useful compounds. The erythronolide 6-dEB is 
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converted by the eryF gene product to erythronoiide B, which is, in turn, 
glycosylated by the eryBV gene product to obtain 3-O-mycarosylerythronolide B, 
which contains L-mycarose at Co. The eryCIII gene product then converts this 
compound to erythromycin D by glycosylation with D-desosamine at C-5. 
5 Erythromycin D, therefore, differs from 6-dEB through glycosylation and by the 
addition of a hydroxyl group at C-6. Erythromycin D can be converted to 
erythromycin B in a reaction catalyzed by the eryG gene product by methylating 
the L-mycarose residue at C-3. Erythromcyin D is converted to erythromycin C by 
the addition of a hydroxyl group at C- 12 in a reaction catalyzed by the eryK gene 

10 product. Erythromycin A is obtained from erythromycin C by methylation of the 
mycarose residue in a reaction catalyzed by the eryG gene product. The 
unmodified megalomicin compounds provided by the present invention, such as, 
for example, the 6-dEB or 6-dEB analogs, produced in Streptomyces lividans, can 
be provided to cultures of S. erythraea and converted to the corresponding 

15 derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the examples below. To ensure that only the desired compound is 
produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et al. 9 1985, J. 
BacterioL 164{\): 425-433). Also, one can employ other mutant strains, such as 

20 eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 
carried out in large fermentors for commercial production. 

Moreover, there are other useful organisms that can be employed to 
hydroxylate and/or glycosylate the compounds of the invention. As described 

25 above, the organisms can be mutants unable to produce the polyketide normally 
produced in that organism, the fermentation can be carried out on plates or in large 
fermentors, and the compounds produced can be chemically altered after 
fermentation. Thus, Streptomyces venezuelae, which produces picromycin, 
contains enzymes that can transfer a desosaminyl group to the C-5 hydroxyl and a 

30 hydroxyl group to the C- 12 position. In addition, S. venezuelae contains a 

glucosylation activity that glucosylates the 2' -hydroxyl group of the desosamine 
sugar. This latter modification reduces antibiotic activity, but the glucosyl residue 
is removed by enzymatic action prior to release of the polyketide from the cell. 
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Another organism, S. narbonensis, contains the same modification enzymes as S. 
venezuelae, except the C-12 hydroxylase. Thus, the present invention provides the 
compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. narbonensis 
and S. venezuelae. 

Other organisms suitable for making compounds of the invention include 
Micromonospora megalomicea (discussed above), Streptomyces antibioticus, S. 
fradiae, and S. thermololerans. S. antibioticus produces oleandomycin and 
contains enzymes that hydroxylate the C-6 and C-12 positions, glycosylate the C-3 
hydroxyl with oleandrose and the C-5 hydroxyl with desosamine, and form an 
epoxide at C-8-C-8a. S. fradiae contains enzymes that glycosylate the C-5 
hydroxyl with mycaminose and then the 4'-hydroxyl of mycaminose with 
mycarose, forming a disaccharide. S. thermololerans contains the same activities 
as S. fradiae, as well as acylation activities. Thus, the present invention provides 
the compounds produced by hydroxylation and glycosylation of the macrolide 
aglycones of the invention by action of the enzymes endogenous to S. antibioticus, 
S. fradiae, and S. thermololerans. 

The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydroxylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAl, megAII, and megAIII genes with one or more 
deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene (as discussed in the next Section), 
can be included on expression vectors suitable for expression of the encoded gene 
products in Saccharopolyspora erythraea, Streptomyces antihioticus, S. 
venezuelae, S. narbonensis, Micromonospora megalomicea, S. fradiae, and S. 
thermololerans. 

A number of erythromycin high-producing strains of Saccharopolyspora 
erythraea and Streptomyces fradiae have been developed, and in a preferred 
embodiment, the megalomicin PKS and/or other megalomicin biosynthetic genes 
are introduced into such strains (or erythromycin non-producing mutants thereof) 
to provide the corresponding modified megalomicin compounds in high yields. 
Those of skill in the art will appreciate that S. erythraea contains the desosamine 

45 
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and mycarose biosynthetic and transfer genes as well as DEBS, which, as noted 
above, makes the same macrolide aglycone, 6-dEB, as the megalomicin PKS. S. 
erythraea does not make megosamine or its corresponding transferase gene, and 
does not contain the acylation gene of Micromonospora megalomicea. Finally, the 
5 S. erythraea eryG gene product converts mycarose to cladinose, which does not 
occur in M megalomicea. Thus, the present invention provides a wide variety of 
S. erythraea recombinant host cells, including, for example, those that contain: 

(i) wild-type erythromycin biosynthetic genes with recombinant 
megosamine biosynthetic and transfer genes, with and without megalomicin 

10 acylation genes; 

(ii) wild-type erythromycin biosynthetic genes except eryG, with 
recombinant megosamine biosynthetic and transfer genes, with and without 
megalomicin acylation genes; and 

(iii) as in (i) and (ii), except that the eryA genes are inactive or deleted and 
15 recombinant megA genes have been introduced. 

The invention provides other S. erythraea strains as well, including those 
in which any one or more of the erythromycin biosynthetic genes have been 
deleted or otherwise rendered inactive and in which at least one megalomicin 
biosynthetic gene has been introduced. 

20 For example, the present invention enables one to express the megosamine 

genes in a Saccharopolyspora erythraea eryG mutant in which the erythromycin C 
made by this mutant is converted to megalomicin A. Alternatively, one could use 
an erythromycin C high -producing strain of S. erythraea in biotransformation 
methods in which the erythromycin C is fed to a Streptomyces lividans strain 

25 carrying only the megosamine biosynthesis and glycosyltransferase genes. As 
another alternative, one could use a strain of S. lividans that carries suitable 
erythromycin production genes along with the daunosamine biosynthesis genes 
plus geneX and geneY of Figure 5, or all of the megosamine biosynthesis genes, to 
produce megalomicin A. 

30 All or some of the megalomicin gene cluster can be easily cloned under 

control of a suitable promoter in pCK7 or pSETl 52 either in one or two plasmids 
and introduced into the Saccharopolyspora erythraea eryG mutant. The actll- 
ORF4/tfc7/p system and the phiC31/w/ system in pSET function well in this 
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organism (see Rowe et aL, 1998, Gene, 216:21 5-23, incorporated herein by 
reference). Alternatively, the megosamine biosynthesis genes are introduced into 
Streptomyces lividans on the same plasmids and the production of megalomicin A 
or its precursor mediated by byconversion, done by feeding erythronolide B, 3- 
5 alpha-mycarosylerythionolide B, erythromycin D or erythromycin C to the S. 
lividans strain. 

Lack of adequate resistance to megalomicin A in S. erythraea or & 
lividans is not expected, because both organisms have MLS resistance genes 
(ermE and mgt/lrm, respectively), which confer resistance to several 14-membered 

0 macrolides (see Cundliffe, 1989, Annu, Rev. Microbiol. 43:207-33; Jenkins and 
Cundliffe, 1991, Gene 708:55-62; and Cundliffe, 1992, Gene, 7/5:75-84, each of 
which is incorporated herein by reference). One can also readily determine the 
level of resistance of the S. erythraea eryG mutant and the S. lividans host cells to 
megalomicin A, both in plate tests and in liquid medium. One can repeat the 

5 bioconversion method using an eryG mutant of a high erythromycin A producing 
S. erythraea strain (or an eryB or eryC mutant, as necessary) to determine the level 
at which megalomicin A can be produced. Furthermore, if experience shows that 
high level megalomicin A production requires a higher level of resistance to this 
macrolide than present in S. erythraea or & lividans, the necessary megalomicin 

0 self-resistance genes will be cloned from M. megalomicea and moved into either 
one of the heterologous hosts. This will be straightforward work since self- 
resistance genes are usually found in the cluster of macrolide biosynthesis genes 
and can be identified by their homology to known macrolide resistance genes 
and(or) by the resistance phenotype they impart to a strain that normally is 

5 sensitive. 

Alternatively, geneX and geneY (Figure 5) can be added to cassettes 
containing the relevant daunosamine (dnm) biosynthesis genes (Figure 5) to 
provide the ability to make TDP-megosamine in vivo and attach it to an 
erythromycin algycone. The TDP-daunosamine biosynthesis genes can be re- 
0 cloned from Streptomyces peucetius on two compatible and mutually selectable 
plasmids. When an S. lividans strain containing these two plasmids and the dnmS 
gene for TDP-daunosamine glycosyltransferase is grown in the presence of added 
epsilon-rhodomycinone, its glycoside with L-daunosamine, called rhodomycin D, 

47 
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is produced in good yield. Thus, biocon version of one of the erythromycins to 
megalomicin A should be observed when geneX and gene Y are present. One can 
construct all five combination - the two jV-dimethyltransferase genes and the three 
glycosyltransferase genes - to discriminate geneX and geneY from those connected 
5 with mycarose and desosamine biosynthesis and attachment in the megalomicin 
pathway. 

Because the timing of megosamine addition is unknown, one can test 
erythronolide B, 3-alpha-mycarosylerythronolide B, erythromycin D and 
erythromycin C as substrates provided to a strain that expresses the megosamine 

10 biosynthetic and transferase genes. There is need to test the C3 ? " and(or) C4'" 
acylated metabolites like megalomicin CI, because these metabolites are made 
from megalomicin A and not the converse, based on the precedents in the 
biosynthesis of tylosin (see Arisawa et al. y 1994, Appl Environ. Microbiol 60: 
2657-2661), carbomycin (see Epp et ai y 1989, Gene 55:293-301), and 

15 midecamycin (see Hara and Hutchinson, 1992, J. Bacieriol 7 74, 5141 -5144). If 
C-6 glycosylation of erythronolide B or 3-alpha-mycarosylerythronolide B (Figure 
5) happens before addition of desosamine to C-5, then the erythromycin genes 
might not be able to complete formation of megalomicin A from some mono or 
diglycoside if the erythromycin glycosyltransferases cannot tolerate a C-6 

20 glycoside. Although unexpected, such an outcome could be circumvented in 
accordance with the methods of the invention by cloning further megalomicin 
biosynthesis genes into the appropriate & erythraea background or into S. lividans 
- specifically, the necessary deoxysugar biosynthesis and attachment genes - to 
create a recombinant strain that produces megalomicin A. 

25 The acyltransferase gene that adds acetate or propionate to the C3 ' " or 

C4'" positions of mycarose in megalomicin B, CI and C2 (Figure 3) is contained 
within the cosmids of the invention and can be identified by scanning the sequence 
data for the megalomicin gene cluster to locate homologs of carE and mdmB or 
their acyA homologs from the tylosin producer. The carE and acyA genes govern 

30 C4"' acylation in the carbomycin and tylosin pathway, respectively. The 

megalomicin homolog has the equivalent function in megalomicin biosynthesis 
(but is specific for C3'" and C4'" acylation). The gene can be cloned under 
control of a suitable promoter and introduced into S. lividans to produce the 
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desired acyl derivative of megalomicin A. Alternatively, introduction of the carE 
gene can form megalomicin B. This gene can be cloned from the carbomycin, 
spiramycin or tylosin producers. 

If the amount of megalomicin produced by an S. etythraea or S. lividans or 
5 other recombinant host cell is less than desired, yield can be improved by 
optimizing the growth medium and fermentation conditions, by increasing 
expression of the gene(s) that appear to be rate limiting, based on the level of 
pathway intermediates that are accumulated by the recombinant strain constructed, 
and by reconstructing the ery, dnm, and megalomicin biosynthesis genes on 

10 vectors like pSETl 52 that can be integrated into the genome to provide a stabler 
recombinant strain for strain improvement. 

In another embodiment, the present invention provides recombinant 
vectors encoding one or more of the megosamine, desosamine, and mycarose 
biosynthetic and transfer genes and heterologous host cells comprising those 

15 vectors. In this embodiment of the invention, the heterologous host cell is typically 
a cell that is unable to produce the sugar and transfer it to a polyketide unless the 
vector of the invention is introduced. For example, neither Streptomyces lividans 
nor S. coelicolor is naturally capable of making megosamine, desosamine, or 
mycarose or transferring those moieties to a polyketide. However, the present 

20 invention provides recombinant Streptomyces lividans and S. coelicolor host cells 
that are capable of making megosamine, desosamine, and/or mycarose and 
transferring those moieties to a polyketide. 

Moreover, additional recombinant gene products can be expressed in the 
host cell to improve production of a desired polyketide. As but one non-limiting 

25 example, certain of the recombinant PKS proteins of the invention may produce a 
polyketide other than or in addition to the predicted polyketide, because the 
polyketide is cleaved from the PKS by the thioesterase (TE) domain in module 6 
prior to processing by other domains on the PKS, in particular, any KR, DH, 
and/or ER domains in module 6. The production of the predicted polyketide can 

30 be increased in such instances by deleting the TE domain coding sequences from 
the gene and, optionally, expressing the TE domain as a separate protein. See 
Gokhale et aL y Feb. 1 999, "Mechanism and specificity of the terminal thioesterase 
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domain from the erythromycin polyketide synthase " Chem. & Bioi 6: 1 17-125, 
incorporated herein by reference. 

Thus, in one important aspect, the present invention provides methods, 
expression vectors, and recombinant host cells that enable the production of 
5 megalomicin and hydroxylated and glycosylated derivatives of megalomicin in 
heterologous host cells. The present invention also provides methods for making a 
wide variety of polyketides derived in part from the megalomicin PKS or other 
biosynthetic genes, as described in the following Section. 

10 Section VI: Hybrid PKS Genes 

The present invention provides recombinant DNA compounds encoding 
each of the domains of each of the modules of the megalomicin PKS as well as the 
other megalomicin biosynthetic enzymes. The availability of these compounds 
permits their use in recombinant procedures for production of desired portions of 

1 5 the megalomicin PKS fused to or expressed in conjunction with all or a portion of 
a heterologous PKS and, optionally, one or more polyketide modification 
enzymes. These compounds also permit the modification of polyketides with the 
various megalomicin modification enzymes. The resulting hybrid PKS can then be 
expressed in a host cell to produce a desired polyketide or modified form thereof. 

20 Thus, in accordance with the methods of the invention, a portion of the 

megalomicin biosynthetic gene coding sequence that encodes a particular activity 
can be isolated and manipulated, for example, to replace the corresponding region 
in a different modular PKS gene or modification enzyme gene. In addition, coding 
sequences for individual proteins, modules, domains, and portions thereof of the 

25 megalomicin PKS can be ligated into suitable expression systems and used to 
produce the portion of the protein encoded. The resulting protein can be isolated 
arid purified or can may be employed in situ to effect polyketide synthesis. 
Depending on the host for the recombinant production of the domain, module, 
protein, or combination of proteins, suitable control sequences such as promoters, 

30 termination sequences, enhancers, and the like are ligated to the nucleotide 

sequence encoding the desired protein in the construction of the expression vector, 
as described above. 
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In one important embodiment, the invention thus provides hybrid PKS 
enzymes and the corresponding recombinant DNA compounds that encode those 
hybrid PKS enzymes. For purposes of the invention, a hybrid PKS is a 
recombinant PKS that comprises all or part of one or more extender modules, 
5 loading module, and/or thioesterase/cyclase domain of a first PKS and all or part 
of one or more extender modules, loading module, and/or thioesterase/cyclase 
domain of a second PKS. In one preferred embodiment, the first PKS is most but 
not all of the megalomicin PKS, and the second PKS is only a portion of a non- 
megalomicin PKS. An illustrative example of such a hybrid PKS includes a 

1 0 megalomicin PKS in which the megalomicin PKS loading module has been 

replaced with a loading module of another PKS. Another example of such a hybrid 
PKS is a megalomicin PKS in which the AT domain of extender module 3 is 
replaced with an AT domain that binds only malonyl CoA. In another preferred 
embodiment, the first PKS is most but not all of a non-megalomicin PKS, and the 

1 5 second PKS is only a portion of the megalomicin PKS. An illustrative example of 
such a hybrid PKS includes a rapamycin PKS in which an AT specific for malonyl 
CoA is replaced with the AT from the megalomicin PKS specific for 
methylmalonyl CoA. Other illustrative hybrid PKSs of the invention are described 
below. 

20 Those of skill in the art will recognize that all or part of either the first or 

second PKS in a hybrid PKS of the invention need not be isolated from a naturally 
occurring source. For example, only a small portion of an AT domain determines 
its specificity. See PCT patent application No. WO US99/1 5047, and Lau et al. 9 
infra, incorporated herein by reference. The state of the art in DNA synthesis 

25 allows the artisan to construct de novo DNA compounds of size sufficient to 
construct a useful portion of a PKS module or domain. Thus, the desired 
derivative coding sequences can be synthesized using standard solid phase 
synthesis methods such as those described by Jaye et al. % 1984, J. Biol Chem. 259: 
6331, and instruments for automated synthesis are available commercially from, 

30 for example, Applied Biosystems, Inc. For purposes of the invention, such 
synthetic DNA compounds are deemed to be a portion of a PKS. 

With this general background regarding hybrid PKSs of the invention, one 
can better appreciate the benefit provided by the DNA compounds of the invention 
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that encode the individual domains, modules, and proteins that comprise the 
megalomicin PKS. As described above, the megalomicin PKS is comprised of a 
loading module, six extender modules composed of a KS, AT, ACP, and zero., 
one, two, or three KR, DH, and ER domains, and a thioesterase domain. The DNA 
5 compounds of the invention that encode these domains individually or in 
combination are useful in the construction of the hybrid PKS encoding DNA 
compounds of the invention. For example, a DNA compound of the invention that 
encodes an extender module or portion of an extender module is useful in the 
construction of a coding sequence that encodes a protein subcomponent of a PKS. 

10 The DNA compound of the invention that comprises a coding sequence of a PKS 
subunit protein is useful in the construction of an expression vector that drives 
expression of the subunit in a host cell that expresses the other subunits and so 
produces a functional PKS. 

The recombinant DNA compounds of the invention that encode the 

1 5 loading module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
loading module is inserted into a DNA compound that comprises the coding 
sequence for one or more heterologous PKS extender modules. The resulting 

20 construct, in which the coding sequence for the loading module of the 

heterologous PKS is replaced by that for the coding sequence of the megalomicin 
PKS loading module provides a novel PKS. Examples include the DEBS, 
rapamycin, FK-506, FK-520, rifamycin, and avermectin PKS coding sequences. In 
another embodiment, a DNA compound comprising a sequence that encodes the 

25 megalomicin PKS loading module is inserted into a DNA compound that 
comprises the coding sequence for the megalomicin PKS or a recombinant 
megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the loading module coding sequence 
is utilized in conjuction with a heterologous coding sequence. In this embodiment, 

30 the invention provides, for example, replacing the methylmalonyl CoA (propionyl) 
specific AT with a malonyl CoA (acetyl), ethylmalonyl CoA (butyryl), or other 
CoA specific AT. In addition, the AT and/or ACP can be replaced by another AT 
and/or another ACP or an inactivated KS, such as a KS°, an AT, and/or another 
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ACP. The resulting heterologous loading module coding sequence can be utilized 
in conjunction with a coding sequence for a PKS that synthesizes megalomicin, a 
megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the first 
5 extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS first 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 

1 0 sequence for a module of the heterologous PKS is either replaced by that for the 
first extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for modules of the heterologous PKS, provides a novel PKS 
coding sequence. In another embodiment, a DNA compound comprising a 
sequence that encodes the first extender module of the megalomicin PKS is 

1 5 inserted into a DNA compound that comprises coding sequences for the 
megalomicin PKS or a recombinant megalomicin PKS that produces a 
megalomicin derivative. 

In another embodiment, a portion or all of the first extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

20 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting (which includes inactivating) the 
KR; inserting a DH or a DH and ER; and/or replacing the KR with another KR, a 
DH and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be 

25 replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 
from a coding sequence for another module of the megalomicin PKS, from a gene 
for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous first extender module coding sequence can 

30 be utilized in conjunction with a coding sequence for a PKS that synthesizes 
megalomicin, a megalomicin derivative, or another polyketide. 

Those of skill in the art will recognize, however, that deletion of the KR 
domain of extender module 1 or insertion of a DH domain or DH and KR domains 
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into extender module 1 will prevent the typical cyclization of the polyketide at the 
hydroxyl group created by the KR if such hybrid module is employed as a first 
extender module in a hybrid PKS or is otherwise involved in producing a portion 
of the polyketide at which cyclization is to occur. Such deletions or insertions can 
5 be useful, however, to create linear molecules or to induce cyclization at another 
site in the molecule. 

As noted above, the invention also provides recombinant PKSs and 
recombinant DNA compounds and vectors that encode such PKSs in which the 
KS domain of the first extender module has been inactivated. Such constructs are 

1 0 typically expressed in translational reading frame with the first two extender 
modules on a single protein, with the remaining modules and domains of a 
megalomicin, megalomicin derivative, or hybrid PKS expressed as one or more, 
typically two, proteins to form the multi-protein functional PKS. The utility of 
these constructs is that host cells expressing, or cell free extracts containing, the 

1 5 PKS encoded thereby can be fed or supplied with N-acylcysteamine thioesters of 
precursor molecules to prepare megalomicin derivative compounds. See U.S. 
patent application Serial No. 09/492,733, filed 27 Jan. 2000, and PCT publication 
Nos. WO 00/44717, 99/03986 and 97/02358, each of which is incorporated herein 
by reference. 

20 The recombinant DNA compounds of the invention that encode the second 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS 
second extender module is inserted into a DNA compound that comprises the 

25 coding sequence for a heterologous PKS. The resulting construct, in which the 
coding sequence for a module of the heterologous PKS is either replaced by that 
for the second extender module of the megalomicin PKS or the latter is merely 
added to coding sequences for the modules of the heterologous PKS, provides a 
novel PKS. In another embodiment, a DNA compound comprising a sequence that 

30 encodes the second extender module of the megalomicin PKS is inserted into a 

DNA compound that comprises the coding sequences for the megalomicin PKS or 
a recombinant megalomicin PKS that produces a megalomicin derivative. 
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In another embodiment, a portion or all of the second extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
5 CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 
replacing the KR with a KR, a KR and a DH, or a KR, DH, and ER; and/or 
inserting a DH or a DH and an ER. In addition, the KS and/or ACP can be 
replaced with another KS and/or ACP. In each of these replacements or insertions, 
the heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate 

1 0 from a coding sequence for another module of the megalomicin PKS, from a 

coding sequence for a PKS that produces a polyketide other than megalomicin, or 
from chemical synthesis. The resulting heterologous second extender module 
coding sequence can be utilized in conjunction with a coding sequence from a 
PKS that synthesizes megalomicin, a megalomicin derivative, or another 

15 polyketide. 

The recombinant DNA compounds of the invention that encode the third 
extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS third 

20 extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
third extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

25 In another embodiment, a DNA compound comprising a sequence that encodes 
the third extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the third extender module 

30 coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 
replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
CoA, or 2-hydroxymalonyl CoA specific AT; deleting the inactive KR; and/or 
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replacing the KR with an active KR, or a ICR and DH, or a KR, DH, and ER. In 
addition, the KS and/or ACP can be replaced with another KS and/or ACP. In 
each of these replacements or insertions, the heterologous KS, AT, DH, KR, ER, 
or ACP coding sequence can originate from a coding sequence for another module 
5 of the megalomicin PKS, from a gene for a PKS that produces a polyketide other 
than megalomicin, or from chemical synthesis. The resulting heterologous third 
extender module coding sequence can be utilized in conjunction with a coding 
sequence for a PKS that synthesizes megalomicin, a megalomicin derivative, or 
another polyketide. 

1 0 The recombinant DNA compounds of the invention that encode the fourth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 
DNA compound comprising a sequence that encodes the megalomicin PKS fourth 
extender module is inserted into a DNA compound that comprises the coding 

1 5 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fourth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 

20 the fourth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion of the fourth extender module coding 
sequence is utilized in conjunction with other PKS coding sequences to create a 

25 hybrid module. In this embodiment, the invention provides, for example, replacing 
the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl CoA, or 
2-hydroxymalonyl CoA specific AT; deleting or inactivating any one, two, or all 
three of the ER, DH, and KR; and/or replacing any one, two, or all three of the ER, 
DH, and KR with either a KR, a DH and KR, or a KR, DH, and ER. In addition, 

30 the KS and/or ACP can be replaced with another KS and/or ACP. In each of these 
replacements or insertions, the heterologous KS, AT, DH, KR, ER, or ACP coding 
sequence can originate from a coding sequence for another module of the 
megalomicin PKS (except for the DH and ER domains), from a coding sequence 
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for a PKS that produces a polyketide other than megalomicin, or from chemical 
synthesis. The resulting heterologous fourth extender module coding sequence can 
be utilized in conjunction with a coding sequence for a PKS that synthesizes 

; megalomicin, a megalomicin derivative, or another polyketide. 

■ 5 The recombinant DNA compounds of the invention that encode the fifth 

extender module of the megalomicin PKS and the corresponding polypeptides 
encoded thereby are useful for a variety of applications. In one embodiment, a 

i 

DNA compound comprising a sequence that encodes the megalomicin PKS fifth 
extender module is inserted into a DNA compound that comprises the coding 
10 sequence for a heterologous PKS. The resulting construct, in which the coding 
sequence for a module of the heterologous PKS is either replaced by that for the 
fifth extender module of the megalomicin PKS or the latter is merely added to 

; coding sequences for the modules of the heterologous PKS, provides a novel PKS. 

In another embodiment, a DNA compound comprising a sequence that encodes 
1 5 the fifth extender module of the megalomicin PKS is inserted into a DNA 
compound that comprises the coding sequence for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

! In another embodiment, a portion or all of the fifth extender module 

coding sequence is utilized in conjunction with other PKS coding sequences to 

I 20 create a hybrid module. In this embodiment, the invention provides, for example, 

replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 

i CoA, or 2-hydroxymalonyl CoA specific AT; deleting (or inactivating) the KR; 

' inserting a DH or a DH and ER; and/or replacing the KR with another KR, a DH 

and KR, or a DH, KR, and ER. In addition, the KS and/or ACP can be replaced 
25 with another KS and/or ACP. In each of these replacements or insertions, the 

heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous fifth extender module coding 
30 sequence can be utilized in conjunction with a coding sequence for a PKS that 
synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The recombinant DNA compounds of the invention that encode the sixth 
extender module of the megalomicin PKS and the corresponding polypeptides 
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encoded thereby are useful for a variety of applications. In one embodiment, a 
DN A compound comprising a sequence that encodes the megalomicin PKS sixth 
extender module is inserted into a DNA compound that comprises the coding 
sequence for a heterologous PKS. The resulting construct, in which the coding 
5 sequence for a module of the heterologous PKS is either replaced by that for the 
sixth extender module of the megalomicin PKS or the latter is merely added to 
coding sequences for the modules of the heterologous PKS, provides a novel PKS. 
In another embodiment, a DNA compound comprising a sequence that encodes 
the sixth extender module of the megalomicin PKS is inserted into a DNA 

10 compound that comprises the coding sequences for the megalomicin PKS or a 
recombinant megalomicin PKS that produces a megalomicin derivative. 

In another embodiment, a portion or all of the sixth extender module 
coding sequence is utilized in conjunction with other PKS coding sequences to 
create a hybrid module. In this embodiment, the invention provides, for example, 

1 5 replacing the methylmalonyl CoA specific AT with a malonyl CoA, ethylmalonyl 
Co A, or 2-hydroxymalonyl CoA specific AT; deleting or inactivating the KR or 
replacing the KR with another KR, a KR and DH, or a KR, DH, and an ER; and/or 
inserting a DH or a DH and ER. In addition, the KS and/or ACP can be replaced 
with another KS and/or ACP. In each of these replacements or insertions, the 

20 heterologous KS, AT, DH, KR, ER, or ACP coding sequence can originate from a 
coding sequence for another module of the megalomicin PKS, from a coding 
sequence for a PKS that produces a polyketide other than megalomicin, or from 
chemical synthesis. The resulting heterologous sixth extender module coding 
sequence can be utilized in conjunction with a coding sequence for a PKS that 

25 synthesizes megalomicin, a megalomicin derivative, or another polyketide. 

The sixth extender module of the megalomicin PKS is followed by a 
thioesterase domain. This domain is important in the cyclization of the polyketide 
and its cleavage from the PKS. The present invention provides recombinant DNA 
compounds that encode hybrid PKS enzymes in which the megalomicin PKS is 

30 fused to a heterologous thioesterase or a heterologous PKS is fused to the 

megalomicin PKS thioesterase. Thus, for example, a thioesterase domain coding 
sequence from another PKS gene can be inserted at the end of the sixth (or other 
final) extender module coding sequence in recombinant DNA compounds of the 
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invention or the megalomicin PKS thioesterase can be similarly fused to a 
heterologous PKS. Recombinant DNA compounds encoding this thioesterase 
domain are useful in constructing DNA compounds that encode the megalomicin 
PKS, a PKS that produces a megalomicin derivative, and a PKS that produces a 
5 polyketide other than megalomicin or a megalomicin derivative. 

Thus, the hybrid modules of the invention are incorporated into a PKS to 
provide a hybrid PKS of the invention. A hybrid PKS of the invention can result 
not only: 

(i) from fusions of heterologous domain (where heterologous means the 
10 domains in a module are derived from at least two different naturally occurring 

modules) coding sequences to produce a hybrid module coding sequence 
contained in a PKS gene whose product is incorporated into a PKS, 
but also: 

(ii) from fusions of heterologous modules (where heterologous module 
1 5 means two modules are adjacent to one another that are not adjacent to one 

another in naturally occurring PKS enzymes) coding sequences to produce a 
hybrid coding sequence contained in a PKS gene whose product is incorporated 
into a PKS, 

(iii) from expression of one or more megalomicin PKS genes with one or 
20 more non-megalomicin PKS genes, including both naturally occurring and 

recombinant non-megalomicin PKS genes, and 

(iv) from combinations of the foregoing. 

Various hybrid PKSs of the invention illustrating these various alternatives are 
described herein. 

25 An example of a hybrid PKS comprising fused modules results from 

fusion of the loading module of either the DEBS PKS or the narbonolide PKS (see 
PCT patent application No. US99/1 1814, incorporated herein by reference) with 
extender modules 1 and 2 of the megalomicin PKS to produce a hybrid megAI 
gene. Co-expression of either one of these two hybrid megAI genes with the 

30 megAII and meg/4/// genes in suitable host cells, such as Streptomcyes lividam, 
results in expression of a hybrid PKS of the invention that produces 6- 
deoxyerythronolide B (the polyketide product of the natural megA genes) in 
recombinant host cells. Co-expression of either one of these two hybrid megAI 
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genes with the eryAII and eryAIII genes similarly results in the production of 6- 
dEB, while co-expression with the analogous narbonolide PKS genes, picAII, 
picAIII and picAIV, results in the production of 3-deoxy-3-oxo-6-dEB (3-keto-6- 
dEB), useful in the production of ketolides, compounds with potent anti-bacterial 
5 activity. 

Another example of a hybrid PKS comprising a hybrid module is prepared 
by co-expressing the megAI and megAII genes with a megAIJI hybrid gene 
encoding extender module 5 and the KS and AT of extender module 6 of the 
rnegalomicin PKS fused to the ACP of module 6 and the TE of the narbonolide 

1 0 PKS. The resulting hybrid PKS of the invention produces 3-keto-6-dEB. This 

compound can also be prepared by a recombinant rnegalomicin derivative PKS of 
the invention in which the KR domain of module 6 of the rnegalomicin PKS has 
been deleted. Moreover, the invention provides hybrid PKSs in which not only the 
above changes have been made but also the AT domain of module 6 has been 

1 5 replaced with a malonyl-specific AT. These hybrid PKSs produce 2-desmethyl-3- 
deoxy-3-oxo-6-dEB, a useful intermediate in the preparation of 2-desmethyl 
ketolides, compounds with potent antibiotic activity. 

Another illustrative example of a hybrid PKS includes the hybrid PKS of 
the invention resulting only from the latter change in the hybrid PKS just 

20 described. Thus, co-expression of the megAI and megAII genes with a hybrid 
megAIII gene in which the AT domain of module 6 has been replaced by a 
malonyl-specific AT results in the expression of a hybrid PKS that produces 2- 
desmethyl-6-dEB in recombinant host cells. This compound is a useful 
intermediate for making 2-desmethyl erythromycins in recombinant host cells of 

25 the invention, as well as for making 2-desmethyl semi-synthetic ketolides. 

While many of the hybrid PKSs described above are composed primarily 
of rnegalomicin PKS proteins, those of skill in the art recognize that the present 
invention provides many different hybrid PKSs, including those composed of only 
a small portion of the rnegalomicin PKS. For example, the present invention 

30 provides a hybrid PKS in which a hybrid eryAI gene that encodes the rnegalomicin 
PKS loading module fused to extender modules 1 and 2 of DEBS is coexpressed 
with the eryAII and eryAIII genes. The resulting hybrid PKS produces 6-dEB, the 
product of the native DEBS. When the construct is expressed in 
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Saccharopolyspora erythraea host cells (either via chromosomal integration in the 
chromosome or via a vector that encodes the hybrid PKS), the resulting 
recombinant host cell of the invention produces erythromycins. Another 
illustrative example is the hybrid PKS of the invention composed of the megAI 
5 and eryAII and eryAJIl gene products. This construct is also useful in expressing 
erythromycins in Saccharopolyspora erythraea host cells. In a preferred 
embodiment, the S. erythraea host cells are eryAJ mutants that do not produce 6- 
deoxyerythronolide B. 

Another example is the hybrid PKS of the invention composed of the 

1 0 products of the picAI and picAII genes (the two proteins that comprise the loading 
module and extender modules 1 - 4, inclusive, of the narbonolide PKS) and the 
megAIII gene. The resulting hybrid PKS produces the macrolide aglycone 3- 
hydroxy-narbonolide in Streptomyces lividans host cells and the corresponding 
erythromycins in Saccharopolyspora erythraea host cells. 

15 Each of the foregoing hybrid PKS enzymes of the invention, and the hybrid 

PKS enzymes of the invention generally, can be expressed in a host cell that also 
expresses a functional oleP gene product. The oleP gene encodes an oleandomycin 
modification enzyme, and expression of the gene together with a hybrid PKS of 
the invention provides the compounds of the invention in which a C-8 hydroxyl, a 

20 C-8a or C-8-C-8a epoxide is present. 

Recombinant methods for manipulating modular PKS genes to make 
hybrid PKS enzymes are described in U.S. Patent Nos. 5,672,491 ; 5,843,718; 
5,830,750; and 5,712,146; and in PCT publication Nos. 98/49315 and 97/02358, 
each of which is incorporated herein by reference. A number of genetic 

25 engineering strategies have been used with DEBS to demonstrate that the 

structures of polyketides can be manipulated to produce novel natural products, 
primarily analogs of the erythromycins (see the patent publications referenced 
supra and Hutchinson, 1998, Curr Opin Microbiol. 7:319-329, and Baltz, 1998, 
Trends Microbiol. (J: 76-83, incorporated herein by reference). Because of the 

30 similar activity of the megalomicin PKS and DEBS (both PKS enzymes produce 
the macrolide aglycone 6-dEB), these methods can be readily applied to the 
recombinant megalomicin PKS genes of the invention. 
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These techniques include: (i) deletion or insertion of modules to control 
chain length, (ii) inactivation of reduction/dehydration domains to bypass beta- 
carbon processing steps, (iii) substitution of AT domains to alter starter and 
extender units, (iv) addition of reduction/dehydration domains to introduce 
5 catalytic activities, and (v) substitution of ketoreductase KR domains to control 
hydroxyl stereochemistry. In addition, engineered blocked mutants of DEBS have 
been used for precursor directed biosynthesis of analogs that incorporate 
synthetically derived starter units. For example, more than 100 novel polyketides 
were produced by engineering single and combinatorial changes in multiple 

10 modules of DEBS. Hybrid PKS enzymes based on DEBS with up to three catalytic 
domain substitutions were constructed by cassette mutagenesis, in which various 
DEBS domains were replaced with domains from the rapamycin PKS (see 
Schweke et aL, 1995, Proc. Nat. Acad. Set USA 92, 7839-7843, incorporated 
herein by reference) or one more of the DEBS KR domains was deleted. 

1 5 Functional single domain replacements or deletions were combined to generate 
DEBS enzymes with double and triple catalytic domain substitutions (see 
McDaniel et aL, 1999, Proc. Nat. Acad Sci. USA 96, 1846-1851, incorporated 
herein by reference). By providing the analogous megalomicin/rapamycin hybrid 
PKS enzymes, the present invention provides alternative means to make these 

20 polyketides. 

Methods for generating libraries of polyketides have been greatly improved 
by cloning PKS genes as a set of three or more mutually selectable plasmids, each 
carrying a different wild-type or mutant PKS gene, then introducing all possible 
combinations of the plasmids with wild-type, mutant, and hybrid PKS coding 

25 sequences into the same host (see U.S. patent application Serial No. 60/1 29,73 1 , 
filed 16 Apr. 1999, and PCT Pub. No. 98/27203, each of which is incorporated 
herein by reference). This method can also incorporate the use of a KS1° mutant, 
which by mutational biosynthesis can produce polyketides made from diketide 
starter units (see Jacobsen et aL, 1997, Science 277, 367-369, incorporated herein 

30 by reference), as well as the use of a truncated gene that leads to 12-membered 
macrolides or an elongated gene that leads to 1 6-membered ketolides. Moreover, 
by utilizing in addition one or more vectors that encode glycosyl biosynthesis and 
transfer genes, such as those of the present invention for megosamine, 
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desosamine, oleandrose, cladinose, and/or mycarose (in any combination), a large 
collection of glycosylated polyketjdes can be prepared. 

The following Table lists references describing illustrative PKS genes and 
corresponding enzymes that can be utilized in the construction of the recombinant 
5 hybrid PKSs and the corresponding DNA compounds that encode them of the 
invention. Also presented are various references describing tailoring enzymes and 
corresponding genes that can be employed in accordance with the methods of the 
invention. 
Avermectin 
10 U.S. Pat. No. 5,252,474 to Merck. 

MacNeil et aL> 1993, Industrial Microorganisms: Basic and Applied 
Molecular Genetics , Baltz, Hegeman, & Skatrud, eds. (ASM), pp. 245-256, A 
Comparison of the Genes Encoding the Polyketide Synthases for Avermectin, 
Erythromycin, and Nemadectin. 
15 MacNeil etal, 1992, Gene 115: 1 19-125, Complex Organization of the 

Sireptomyces avermitilis genes encoding the avermectin polyketide synthase. 
Candicidin (FR008) 

Hu etal., 1994, Mol Microbiol. 14: 163-172. 
Epothilone 

20 PCT Pub. No. 00/03 1 247 to Kosan. 

Erythromycin 

PCT Pub. No. 93/13663 to Abbott. 
US Pat. No. 5,824,513 to Abbott. 
Donadio etal, 1991, Science 252:675-9. 
25 Cortes et aL, 8 Nov. 1 990, Nature 348: 1 76-8, An unusually large 

multifunctional polypeptide in the erythromycin producing polyketide synthase of 
Saccharopolyspora erythraea. 
Glvcosvlation Enzymes 
PCT Pub. No. 97/23630 to Abbott. 
30 FK-506 

Motamedi et al. y 1998, The biosynthetic gene cluster for the macrolactone 
ring of the immunosuppressant FK506, Eur. J. biochem. 256: 528-534. 
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Motamedi et al, 1997, Structural organization of a multifunctional 
polyketide synthase involved in the biosynthesis of the macrolide 
immunosuppressant FK506, Eur. J. Biochem. 244: 74-80. 

Methvltransferase 

5 US 5,264,355, issued 23 Nov. 1993, Methylating enzyme from 

Streptomyces MA6858. 31-O-desmethyl-FK506 methyltransferase. 

Motamedi et al, 1996, Characterization of methyltransferase and 
hydroxylase genes involved in the biosynthesis of the immunosuppressants FK506 
andFK520, J. Bactehoi 178: 5243-5248. 
10 FK-520 

PCT Pub. No. 00/20601 to Kosan. 

See also Nielsen et al, 1991, Biochem. 50:5789-96 (enzymology of 
pipecolate incorporation). 
Lovastatin 

15 U.S. Pat. No. 5,744,350 to Merck. 

Narbomycin (and Picromycin) 

PCT Pub. No. WO US99/61599 to Kosan. 
Nemadectin 

MacNeil et al , 1 993, supra. 
20 Niddamycin 

Kakavas et al, 1997, Identification and characterization of the niddamycin 
polyketide synthase genes from Streptomyces caelestis, J. Bacteriol 179: 7515- 
7522. 

Oleandomycin 

25 Swan et al, 1994, Characterization of a Streptomyces antibioticus gene 

. encoding a type I polyketide synthase which has an unusual coding sequence, Mol 
Gen. Genet. 242: 358-362. 

PCT Pub. No. 00/026349 to Kosan. 

Olano et al, 1998, Analysis of a Streptomyces antibioticus chromosomal 
30 region involved in oleandomycin biosynthesis, which encodes two 

glycosyltransferases responsible for glycosylation of the macrolactone ring, Mol 

Gen. Genet. 259(3): 299-308. 

Platenolidc 
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EP Pub. No. 791,656 to Lilly. 
Rapamycin 

Schwecke et aL, Aug. 1 995, The biosynthetic gene cluster for the 
polyketide rapamycin, Proc. Natl. Acad Sci USA 92:7839-7843. 
5 Aparicio et ai, 1996, Organization of the biosynthetic gene cluster for 

rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in 
the modular polyketide synthase, Gene 169: 9-16. 
Rifamycin 

August et aL, 13 Feb. 1998, Biosynthesis of the ansamycin antibiotic 
10 rifamycin: deductions from the molecular analysis of the rz/biosynthetic gene 
cluster of Amycolatopsis mediterranei S669, Chemistry & Biology, 5(2): 69-79. 
Soraphen 

U.S. Pat. No. 5,716,849 to Novartis. 

Schupp et aL, 1995, J. Bacteriology 177: 3673-3679. A Sorangium 
1 5 cellulosum (Myxobacterium) Gene Cluster for the Biosynthesis of the Macrolide 
Antibiotic Soraphen A: Cloning, Characterization, and Homology to Polyketide 
Synthase Genes from Actinomycetes. 
Spiramycin 

U.S. Pat No. 5,098,837 to Lilly. 
20 Activator Gene 

U.S. Pat. No. 5,514,544 to Lilly. 
Tylosin 

EPPub. No. 791,655 to Lilly. 

Kuhstoss et al., 1996, Gene 755:231-6., Production of a novel polyketide 

25 through the construction of a hybrid polyketide synthase. 

U.S. Pat. No. 5,876,991 to Lilly. 
Tailoring enzymes 

Merson-Davies and Cundliffe, 1994, Mol Microbiol 13: 349-355. 
Analysis of five tylosin biosynthetic genes from the tylBA region of the 
30 Sireptomyces fradiae genome. 

As the above Table illustrates, there are a wide variety of PKS genes that serve as 
readily available sources of DNA and sequence information for use in constructing 
the hybrid PKS-encoding DNA compounds of the invention. 
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In constructing hybrid PKSs of the invention, certain general methods may 
be helpful. For example, it is often beneficial to retain the framework of the 
module to be altered to make the hybrid PKS. Thus, if one desires to add DH and 
ER functionalities to a module, it is often preferred to replace the KR domain of 
5 the original module with a cognate KR, DH, and ER domain-containing segment 
from another module, instead of merely inserting DH and ER domains. One can 
alter the stereochemical specificity of a module by replacement of the KS domain 
with a KS domain from a module that specifies a different stereochemistry. See 
Lau et al, 1999, "Dissecting the role of acyltransferase domains of modular 

10 polyketide synthases in the choice and stereochemical fate of extender units" 

Biochemistry 38(5): 1643- 1651, incorporated herein by reference. One can alter the 
specificity of an AT domain by changing only a small segment of the domain. See 
Lau et aL, supra. One can also take advantage of known linker regions in PKS 
proteins to link modules from two different PKSs to create a hybrid PKS. See 

15 Gokhale et al, 1 6 Apr. 1999, Dissecting and Exploiting Intermodular 

Communication in Polyketide Synthases", Science 284: 482-485, incorporated 
herein by reference. 

The hybrid PKS-encoding DNA compounds of the invention can be and 
often are hybrids of more than two PKS genes. Even where only two genes are 

20 used, there are often two or more modules in the hybrid gene in which all or part 
of the module is derived from a second (or third) PKS gene. Thus, as one 
illustrative example, the invention provides a hybrid PKS that contains the 
naturally occurring loading module and thioesterase domain as well as extender 
modules one, two, four, and six of the megalomicin PKS and further contains 

25 hybrid or heterologous extender modules three and five. Hybrid or heterologous 
extender modules three and five contain AT domains specific for malonyl CoA 
and derived from, for example, the rapamycin PKS genes. 

The invention also provides libraries of PKS genes, PKS proteins, and 
ultimately, of polyketides, that are constructed by generating modifications in the 

30 megalomicin PKS so that the protein complexes produced have altered activities 
in one or more respects and thus produce polyketides other than the natural 
product of the PKS. Novel polyketides may thus be prepared, or polyketides in 
general prepared more readily, using this method. By providing a large number of 
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different genes or gene clusters derived from a naturally occurring PKS gene 
cluster, each of which has been modified in a different way from the native cluster, 
an effectively combinatorial library of polyketides can be produced as a result of 
the multiple variations in these activities. As will be further described below, the 
5 metes and bounds of this embodiment of the invention can be described on the 
polyketide, protein, and the encoding nucleotide sequence levels. 

As described above, a modular PKS "derived from" the megalomicin or 
other naturally occurring PKS includes a modular PKS (or its corresponding 
encoding gene(s)) that retains the scaffolding of the utilized portion of the 

1 0 naturally occurring gene. Not all modules need be included in the constructs; 

however, the constructs can also comprise more than six modules. On the constant 
scaffold, at least one enzymatic activity is mutated, deleted, replaced, or inserted 
so as to alter the activity of the resulting PKS relative to the original (native) PKS. 
Alteration results when these activities are deleted or are replaced by a different 

1 5 version of the activity, or simply mutated in such a way that a polyketide other 
than the natural product results from these collective activities. This occurs 
because there has been a resulting alteration of the starter unit and/or extender 
unit, stereochemistry, chain length or cyclization, and/or reductive or dehydration 
cycle outcome at a corresponding position in the product polyketide. Where a 

20 deleted activity is replaced, the origin of the replacement activity may come from a 
corresponding activity in a different naturally occurring PKS or from a different 
region of the megalomicin PKS. Any or all of the megalomicin PKS genes may be 
included in the derivative or portions of any of these may be included, but the 
scaffolding of a functional PKS protein is retained in whatever derivative is 

25 constructed. The derivative preferably contains a thioesterase activity from the 
megalomicin or another PKS. 

Thus, a PKS derived from the megalomicin PKS includes a PKS that 
contains the scaffolding of all or a portion of the megalomicin PKS. The derived 
PKS also contains at least two extender modules that are functional, preferably 

30 three extender modules, and more preferably four or more extender modules, and 
most preferably six extender modules. The derived PKS also contains mutations, 
deletions, insertions, or replacements of one or more of the activities of the 
functional modules of the megalomicin PKS so that the nature of the resulting 
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polyketide is altered at both the protein and DNA sequence levels. Particular 
preferred embodiments include those wherein a KS, AT, or ACP domain has been 
deleted or replaced by a version of the activity from a different PKS or from 
another location within the same PKS. Also preferred are derivatives where at 

5 least one non-condensation cycle enzymatic activity (KR, DH, or ER) has been 
deleted or added or wherein any of these activities has been mutated so as to 
change the structure of the polyketide synthesized by the PKS. 

Conversely, also included within the definition of a PKS derived from the 
megalomicin PKS are functional non-megalomicin PKS modules or their 

D encoding genes wherein at least one domain or coding sequence therefor of a 
megalomicin PKS module has been inserted. Exemplary is the use of the 
megalomicin AT for extender module 2, which accepts a methylmalonyl CoA 
extender unit rather than malonyl CoA, to replace a malonyl specific AT in 
another PKS. Other examples include insertion of portions of non-condensation 

5 cycle enzymatic activities or other regions of megalomicin synthase activity into a 
heterologous PKS at both the DNA and protein levels. 

Thus, there are at least five degrees of freedom for constructing a hybrid 
PKS in terms of the polyketide that will be produced. First, the polyketide chain 
length is determined by the number of extender modules in the PKS, and the 

3 present invention includes hybrid PKSs that contain 6, as wells as fewer or more 
than 6, extender modules. Second, the nature of the carbon skeleton of the PKS is 
determined by the specificities of the acyl transferases that determine the nature of 
the extender units at each position, e.g., malonyl, methylmalonyl, ethylmalonyl, or 
other substituted malonyl. Third, the loading module specificity also has an effect 

5 on the resulting carbon skeleton of the polyketide. The loading module may use a 
different starter unit, such as acetyl, butyryl, and the like. As noted above, another 
method for varying loading module specificity involves inactivating the KS 
activity in extender module 1 (KS1) and providing alternative substrates, called 
diketides, that are chemically synthesized analogs of extender module 1 diketide 

0 products, for extender module 2. This approach was illustrated in PCT publication 
Nos. 97/02358 and 99/03986, incorporated herein by reference, wherein the KS1 
activity was inactivated through mutation. Fourth, the oxidation state at various 
positions of the polyketide will be determined by the dehydratase and reductase 
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portions of the modules. This will determine the presence and location of ketone 
and alcohol moieties and C-C double bonds or C-C single bonds in the polyketide. 

Finally, the stereochemistry of the resulting polyketide is a function of 
three aspects of the synthase. The first aspect is related to the AT/KS specificity 
5 associated with substituted malonyls as extender units, which affects 

stereochemistry only when the reductive cycle is missing or when it contains only 
a ketoreductase, as the dehydratase would abolish chirality. Second, the specificity 
of the ketoreductase may determine the chirality of any beta-OH. Finally, the 
enoylreductase specificity for substituted malonyls as extender units may influence 
the stereochemistry when there is a complete KR/DH/ER available. 

Thus, the modular PKS systems generally and the megalomicin PKS 
system particularly permit a wide range of polyketides to be synthesized. As 
compared to the aromatic PKS systems, the modular PKS systems accept a wider 
range of starter units, including aliphatic monomers (acetyl, propionyl, butyryl, 
isovaleryl, and the like.), aromatics (aminohydroxybenzoyl), alicyclics 
(cyclohexanoyl), and heterocyclics (thiazolyl). Certain modular PKSs have relaxed 
specificity for their starter units (Kao et ai, 1994, Science, supra). Modular PKSs 
also exhibit considerable variety with regard to the choice of extender units in 
each condensation cycle. The degree of beta-ketoreduction following a 
condensation reaction can be altered by genetic manipulation (Donadio et al. , 
\99\, Science, supra\Don?L&\QetaL, 1993, Proc. Natl Acad. ScL USA P0;7119- 
7123). Likewise, the size of the polyketide product can be varied by designing 
mutants with the appropriate number of modules (Kao et al, 1994, J. Am. Chem. 
Soc. 116:1 1612-1 1613). Lastly, modular PKS enzymes are particularly well 
known for generating an impressive range of asymmetric centers in their products 
in a highly controlled manner. The polyketides, antibiotics, and other compounds 
produced by the methods of the invention are typically single stereoisomeric 
forms. Although the compounds of the invention can occur as mixtures of 
stereoisomers, it may be beneficial in some instances to generate individual 
stereoisomers. Thus, the combinatorial potential within modular PKS pathways 
based on any naturally occurring modular, such as the megalomicin, PKS scaffold 
is virtually unlimited. 
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While hybrid PKSs are most often produced by "mixing and matching" 
portions of PKS coding sequences, mutations in DNA encoding a PKS can also be 
used to introduce, alter, or delete an activity in the encoded polypeptide. Mutations 
can be made to the native sequences using conventional techniques. The substrates 
5 for mutation can be an entire cluster of genes or only one or two of them; the 
substrate for mutation may also be portions of one or more of these genes. 
Techniques for mutation include preparing synthetic oligonucleotides including 
the mutations and inserting the mutated sequence into the gene encoding a PKS 
subunit using restriction endonuclease digestion. See, e.g., Kunkel, 1985, Proc. 

10 Natl Acad. Sci. USA 82: 448; Geisselsoder et al, 1987, BioTechniques 5:786. 
Alternatively, the mutations can be effected using a mismatched primer (generally 
10-20 nucleotides in length) that hybridizes to the native nucleotide sequence, at a 
temperature below the melting temperature of the mismatched duplex. The primer 
can be made specific by keeping primer length and base composition within 

1 5 relatively narrow limits and by keeping the mutant base centrally located. See 

Zoller and Smith, 1983, Methods EnzymoL 700:468. Primer extension is effected 
using DNA polymerase, the product cloned, and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. 
Identification can be accomplished using the mutant primer as a hybridization 

20 probe. The technique is also applicable for generating multiple point mutations. 
See, e.g., Dalbie-McFarland etal, 1982, Proc. Natl. Acad Sci. USA 79: 6409. 
PCR mutagenesis can also be used to effect the desired mutations. 

Random mutagenesis of selected portions of the nucleotide sequences 
encoding enzymatic activities can also be accomplished by several different 

25 techniques known in the art, e.g., by inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating 
incorrect nucleotides during in vitro DNA synthesis, by error-prone PCR 
mutagenesis, by preparing synthetic mutants, or by damaging plasmid DNA in 
vitro with chemicals, in accordance with the methods of the present invention. 

30 Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 

nitrosoguanidine, hydroxylamine, agents which damage or remove bases thereby 
preventing normal base-pairing such as hydrazine or formic acid, analogues of 
nucleotide precursors such as 5-bromouracil, 2-aminopurine, or acridine 
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intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
Generally, plasmid DNA or DNA fragments are treated with chemical mutagens, 
transformed into E. coli and propagated as a pool or library of mutant plasmids. 

In constructing a hybrid PKS of the invention, regions encoding enzymatic 
5 activity, i.e., regions encoding corresponding activities from different PKS 
synthases or from different locations in the same PKS, can be recovered, for 
example, using PCR techniques with appropriate primers. By "corresponding" 
activity encoding regions is meant those regions encoding the same general type of 
activity. For example, a KR activity encoded at one location of a gene cluster 
10 "corresponds" to a KR encoding activity in another location in the gene cluster or 
in a different gene cluster. Similarly, a complete reductase cycle could be 
considered corresponding. For example, KR/DH/ER can correspond to a KR 
alone. 

If replacement of a particular target region in a host PKS is to be made, 

15 this replacement can be conducted in vitro using suitable restriction enzymes. The 
replacement can also be effected in vivo using recombinant techniques involving 
homologous sequences framing the replacement gene in a donor plasmid and a 
receptor region in a recipient plasmid. Such systems, advantageously involving 
plasmids of differing temperature sensitivities are described, for example, in PCT 

20 publication No. WO 96/40968, incorporated herein by reference. The vectors used 
to perform the various operations to replace the enzymatic activity in the host PKS 
genes or to support mutations in these regions of the host PKS genes can be 
chosen to contain control sequences operably linked to the resulting coding 
sequences in a manner such that expression of the coding sequences can be 

25 effected in an appropriate host. 

However, simple cloning vectors may be used as well. If the cloning 
vectors employed to obtain PKS genes encoding derived PKS lack control 
sequences for expression operably linked to the encoding nucleotide sequences, 
the nucleotide sequences are inserted into appropriate expression vectors. This 

30 need not be done individually, but a pool of isolated encoding nucleotide 
sequences can be inserted into expression vectors, the resulting vectors 
transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. The invention provides a variety of recombinant DNA 
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compounds in which the various coding sequences for the domains and modules 
of the megalomicin PKS are flanked by non-naturally occurring restriction enzyme 
recognition sites. 

The various PKS nucleotide sequences can be cloned into one or more 
5 recombinant vectors as individual cassettes, with separate control elements, or 
under the control of, e.g., a single promoter. The PKS subunit encoding regions 
can include flanking restriction sites to allow for the easy deletion and insertion of 
other PKS subunit encoding sequences so that hybrid PKSs can be generated. The 
design of such unique restriction sites is known to those of skill in the art and can 

10 be accomplished using the techniques described above, such as site-directed 
mutagenesis and PCR. 

The expression vectors containing nucleotide sequences encoding a variety 
of PKS enzymes for the production of different polyketides are then transformed 
into the appropriate host cells to construct the library. In one straightforward 

1 5 approach, a mixture of such vectors is transformed into the selected host cells and 
the resulting cells plated into individual colonies and selected to identify 
successful transformants. Each individual colony has the ability to produce a 
particular PKS synthase and ultimately a particular polyketide. Typically, there 
will be duplications in some, most, or all of the colonies; the subset of the 

20 transformed colonies that contains a different PKS in each member colony can be 
considered the library. Alternatively, the expression vectors can be used 
individually to transform hosts, which transformed hosts are then assembled into a 
library. A variety of strategies are available to obtain a multiplicity of colonies 
each containing a PKS gene cluster derived from the naturally occurring host gene 

25 cluster so that each colony in the library produces a different PKS and ultimately a 
different polyketide. The number of different polyketides that are produced by the 
library is typically at least four, more typically at least ten, and preferably at least 
20, and more preferably at least 50, reflecting similar numbers of different altered 
PKS gene clusters and PKS gene products. The number of members in the library 

30 is arbitrarily chosen; however, the degrees of freedom outlined above with respect 
to the variation of starter, extender units, stereochemistry, oxidation state, and 
chain length enables the production of quite large libraries. 
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Methods for introducing the recombinant vectors of the invention into 
suitable hosts are known to those of skill in the art and typically include the use of 
CaCl 2 or agents such as other divalent cations, lipofection, DMSO, protoplast 
transformation, conjugation, infection, transfection, and electroporation. The 
5 polyketide producing colonies can be identified and isolated using known 

techniques and the produced polyketides further characterized. The polyketides 
produced by these colonies can be used collectively in a panel to represent a 
library or may be assessed individually for activity. 

The libraries of the invention can thus be considered at four levels: (1) a 

10 multiplicity of colonies each with a different PKS encoding sequence; (2) the 
proteins produced from the coding sequences; (3) the polyketides produced from 
the proteins assembled into a functional PKS; and (4) antibiotics or compounds 
with other desired activities derived from the polyketides. Of course, combination 
libraries can also be constructed wherein members of a library derived, for 

1 5 example, from the megalomicin PKS can be considered as a part of the same 
library as those derived from, for example, the rapamycin PKS or DEBS. 

Colonies in the library are induced to produce the relevant synthases and 
thus to produce the relevant polyketides to obtain a library of polyketides. The 
polyketides secreted into the media can be screened for binding to desired targets, 

20 such as receptors, signaling proteins, and the like. The supernatants per se can be 
used for screening, or partial or complete purification of the polyketides can first 
be effected. Typically, such screening methods involve detecting the binding of 
each member of the library to receptor or other target ligand. Binding can be 
detected either directly or through a competition assay. Means to screen such 

25 libraries for binding are well known in the art and can be applied in accordance 
with the methods of the present invention. Alternatively, individual polyketide 
members of the library can be tested against a desired target. In this event, screens 
wherein the biological response of the target is measured can more readily be 
included. Antibiotic activity can be verified using typical screening assays such as 

30 those set forth in Lehrer et al, 1991, J. Immunol Meth. 737:167-173, incorporated 
herein by reference, and in the Examples below. 

The invention provides methods for the preparation of a large number of 
polyketides. These polyketides are useful intermediates in formation of 
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compounds with antibiotic or other activity through hydroxylation, epoxidation, 
and glycosylation reactions as described above. In general, the polyketide products 
of the PKS must be further modified, typically by hydroxylation and glycosylation, 
to exhibit potent antibiotic activity. Hydroxylation results in the novel polyketides 
5 of the invention that contain hydroxyl groups at C-6 5 which can be accomplished 
using the hydroxylase encoded by the eryF gene, and/or C-12, which can be 
accomplished using the hydroxylase encoded by the picK or eryK gene. Also, the 
oleP gene is available in recombinant form, which can be used to express the oleP 
gene product in any host cell. A host cell, such as a Streptomyces host cell or a 

10 Saccharopolyspora erythraea host cell, modified to express the oleP gene thus can 
be used to produce polyketides comprising the C-8-C-8a epoxide present in 
oleandomycin. Thus the invention provides such modified polyketides. The 
presence of hydroxyl groups at these positions can enhance the antibiotic activity 
of the resulting compound relative to its unhydroxylated counterpart. 

1 5 Methods for glycosylating polyketides are generally known in the art and 

can be applied in accordance with the methods of the present invention; the 
glycosylation may be effected intracellular^ by providing the appropriate 
glycosylation enzymes or may be effected in vitro using chemical synthetic means 
as described herein and in PCT publication No. WO 98/49315, incorporated 

20 herein by reference. Preferably, glycosylation with desosamine, mycarose, and/or 
megosamine is effected in accordance with the methods of the invention in 
recombinant host cells provided by the invention. In general, the approaches to 
effecting glycosylation mirror those described above with respect to 
hydroxylation. The purified enzymes, isolated from native sources or 

25 recombinantly produced may be used in vitro. Alternatively and as noted, 

glycosylation may be effected intracellularly using endogenous or recombinantly 
produced intracellular glycosylases. In addition, synthetic chemical methods may 
be employed. 

The antibiotic modular polyketides may contain any of a number of 
30 different sugars, although D-desosamine, or a close analog thereof, is most 

common. Erythromycin, picromycin, megalomicin, narbomycin, and methymycin 
contain desosamine. Erythromycin also contains L-ciadinose (3-O-methyl 
mycarose). Tylosin contains mycaminose (4-hydroxy desosamine), mycarose and 
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6-deoxy-D-allose. 2-acetyl-l-bromodesosamine has been used as a donor to 
glycosylate polyketides by Masamune et aL, 1975, J. Am. Chem. Soc. 97: 3512- 
3513. Other, apparently more stable donors include glycosyl fluorides, 
thioglycosides, and trichloroacetimidates; see Woodward et aL, 1981, J. Am. 
5 Chem.Soc. 103: 3215; Martin et aL, 1997, J. Am. Chem. Soc. 119: 3193; Toshima 
etaL, 1995, J. Am. Chem. Soc. 117: 3717; Matsumoto et aL, 1988, Tetrahedron 
Lett. 29: 3575. Glycosylation can also be effected using the polyketide aglycones 
as starting materials and using Saccharopolyspora erythraea or Streptomyces 
venezuelae or other host cell to make the conversion, preferably using mutants 
10 unable to synthesize macrolides, as discussed in the preceding Section. 

Thus, a wide variety of polyketides can be produced by the hybrid PKS 
enzymes of the invention. These polyketides are useful as antibiotics and as 
intermediates in the synthesis of other useful compounds, as described in the 
following section. 

15 

Section VII: Host Cells Containing Multiple Expression Vectors 

A recombinant host cell of the invention may contain nucleic acid 
encoding a megalomicin PKS domain, module, or protein, or megalomicin 
modification enzyme at a single genetic locus, e.g., on a single plasmid or at a 

20 single chromosomal locus, or at different genetic loci, e.g., on separate plasmids 
and/or chromosomal loci. By "multiple" is meant two or more; by "vector" is 
meant a nucleic acid molecule which can be used to transform host systems and 
which contains an independent expression system containing a coding sequence 
under control of a promoter and optionally a selectable marker and any other 

25 suitable sequences regulating expression. Typical such vectors are plasmids, but 
other vectors such as phagemids, cosmids, viral vectors and the like can be used 
according to the nature of the host. Of course, one or more of the separate vectors 
may integrate into the chromosome of the host (selection may not be required for 
maintenance of integrated vectors). 

30 In one embodiment, the invention provides a recombinant host cell, which 

comprises at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 
encoding a megalomicin PKS domain or a megalomicin modification enzyme 



WO 01/27284 



PCT/US00/27433 



operably linked to a promoter. In another embodiment, the invention provides a 
recombinant host cell, which comprises at least one autonomously replicating 
recombinant DN A expression vector and at least one modified chromosome, each 
of said vector(s) and each of said modified chromosome comprises a recombinant 
5 DN A compound encoding a megalomicin PKS domain or a megalomicin 

modification enzyme operably linked to a promoter. Preferably, the autonomously 
replicating recombinant DNA expression vector and/or the modified chromosome 
further comprises distinct selectable markers. 

The above multiple-vector (chromosome) expression systems can also be 

10 used for expressing heterogeneous polyketide biosynthetic enzymes, e.g., for 

expressing Micromonospora megalomicea megalomicin PKS protein, module, or 
domain or a megalomicin modification enzyme with a PKS protein, module, or 
domain, or modification enzyme from other origins in the same host cells. By 
placing various activities on different expression vectors, a high degree of 

1 5 variation can be achieved in an efficient manner. A variety of hosts can be used; 
any suitable host cell that can maintain multiple vectors can readily be used. 
Preferred hosts include Streptomyces y yeast, £ coli, other actinomycetes, and plant 
cells, and mammalian or insect cells or other suitable recombinant hosts can also 
be used. Preferred among yeast strains are Saccharomyces cerevisiae and Pichia 

20 pastoris. Preferred actinomycetes include various strains of Streptomyces, 
If one chooses to use a host cell that does not naturally produce a 
polyketide, then one may need to ensure that the recombinant host is modified to 
also contain a holo ACP synthase activity that effects pantetheinylation of the acyl 
carrier protein. See PCT Pub. No. WO 97/13845, incorporated herein by 

25 reference. One of the multiple vectors may be used for this purpose. This 

activation step is necessary for activation of the ACP. The expression system for 
the holo ACP synthase may be supplied on a vector separate from that carrying a 
PKS coding sequence or may be supplied on the same vector or may be integrated 
into the chromosome of the host, or may be supplied as an expression system for a 

30 fusion protein with all or a portion of a polyketide synthase (see U.S. Patent No. 
6,033,883, incorporated herein by reference). 

It should be noted that in some recombinant hosts, it may also be necessary 
to activate the polyketides produced through postsynthesis modifications when 
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polyketides having such modifications are desired. If this is the case for a 

particular host, the host will be modified, for example by transformation, to 

contain those enzymes necessary for effecting these modifications. Among such 

enzymes, for example, are glycosylation enzymes. The use of multiple vectors can 
5 facilitate the introduction of expression systems for such enzymes. 

In a preferred embodiment, the multiple vector system is used to assemble 

rapidly and efficiently a combinatorial library of polyketides and the 

PKS/modification enzymes that produce them. In an illustrative embodiment, the 

multiple vector system comprises four different vectors, one comprising the megAI 
1 0 gene, one the megAII gene, one the megAI II gene, and one the modification 

enzyme(s) gene(s). Each of these vectors can be modified to make a set of vectors. 

For example, one set could contain all possible AT substitutions in the loading and 

first and second extender modules of the megAI gene product. Another set could 

contain expression systems for a variety of different modification enzymes. With 
1 5 these four vectors sets and by combining each member of each set with each 

member of the other three sets, a very large library of cells, vector sets, and 

polyketides can be rapidly and efficiently assembled. 

The combinatorial potential of a modular PKS such as the megalomicin 

PKS (ignoring the additional potential of different modification enzyme systems) 
20 is minimally given by: AT L X (AT E X 4) M where AT L is the number of loading 

acyl transferases, ATe is the number of extender acyl transferases, and M is the 

number of modules in the gene cluster. The number 4 is present in the formula 

because this represents the number of ways a keto group can be modified by either 

1) no reaction; 2) KR activity alone; 3) KR+DH activity; or 4) KR+DH+ER 
25 activity. It has been shown that expression of only the first two modules of the 

erythromycin PKS resulted in the production of a predicted truncated triketide 

product (See Kao et al., 1 Am. Chem. Soc, H6:l 1612-1 1613 ((1994)). A novel 

12-membered macrolide similar to methymycin aglycone was produced by 

expression of modules 1-5 of this PKS in S. coelicolor (See Kao et aL, ./ Am. 
30 Chem. Soc, H7:9 105-91 06 (1995)). This work shows that PKS modules are 

functionally independent so that lactone ring size can be controlled by the number 

of modules present. 
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In addition to controlling the number of modules, the modules can be 
genetically modified, for example, by the deletion of a ketoreductase domain as 
described by Donadio et al., Science, 252:675-679 (1991); and Donadio et al., 
Gene, 1_15:97-103 (1992). In addition, the mutation of an enoyl reductase domain 
5 was reported by Donadio, et al., Proc. Natl Acad Sci, 90:71 19-7123 (1993). 

These modifications also resulted in modified PKS and thus modified polyketides. 

As stated above, in the present invention, the coding sequences for 
catalytic activities derived from the megalomicin PKS systems found in nature can 
be used in their native forms or modified by standard mutagenesis techniques to 

1 0 delete or diminish activity or to introduce an activity into a module in which it was 
not originally present. For example, a KR activity can be introduced into a 
module normally lacking that function. 

In one embodiment of the invention herein, a single host cell is modified to 
contain a multiplicity of vectors, each vector contributing a portion of the 

1 5 synthesis of a megalomicin PKS and modification enzyme (if any) system. Each 
of the multiple vectors for production of the megalomicin PKS system typically 
encodes at least two modules, and at least one of the vectors integrates into the 
chromosome of the host. Integration can be effected using suitable phage or 
integrating vectors or by homologous recombination. If homologous 

20 recombination is used, the integration event may also be designed to delete 
endogenous PKS genes residing in the chromosome, as described in the PCT 
application WO 95/08548. In these embodiments, too, a selectable marker such as 
hygromycin or thiostrepton resistance can be included in the vector that effects 
integration. 

25 As mentioned above, additional enzymes that effect post-translational 

modifications to the enzyme systems in the megalomicin PKS may be introduced 
into the host through suitable recombinant expression systems. In addition, 
enzymes that activate the polyketides themselves, for example, through 
glycosylation may be added. It may also be desirable to modify the cell to produce 

30 more of a particular substrate utilized in polyketide biosynthesis. For example, it 
is generally believed that malonyl CoA levels in yeast are higher than 
methylmalonyl CoA; if yeast is chosen as a host, it may be desirable to increase 
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methylmalonyl CoA levels by the addition of one or more biosynthetic enzymes 
therefor. 

The multiple-vector expression system can also be used to make 
polyketides produced by the addition of synthetic starter units to a PKS that 
5 contains an inactivated ketosynthase (ICS) in the first module. As noted above, 
this modification permits the system to incorporate a suitable diketide thioester 
such as 3-hydroxy-2-methyl pantonoic acid-N-acetyl cysteamine thioester, or 
similar thioesters of diketide analogs, as described by Jacobsen et al., Science, 
277:367-369 (1997). The construction of PKS modules containing inactivated 

10 ketosynthase regions can be conducted by methods known in the art, such as the 
method described in U.S. Patent No. 6,080,555 and PCT publication Nos. WO 
99/03986 and 97/02358, each of which is incorporated herein by reference, in 
accordance with the methods of the present invention. 

The multiple-vector expression system can be used to produce polyketides 

1 5 in hosts that normally do not produce them, such as E. coli and yeast. It also 
provides more efficient means to provide a variety of polyketide products by 
supplying the elements of the introduced PKS, whether in an E. coli or yeast host 
or in other more traditionally used hosts, such as Streptomyces. The invention 
also includes libraries of polyketides prepared using the methods of the invention. 

20 

Section VIII: Compounds 

The methods and recombinant DNA compounds of the invention are useful 
in the production of polyketides. In one important aspect, the invention provides 
methods for making antibiotic compounds related in structure to erythromycin, a 

25 potent antibiotic compound. The invention also provides novel ketolide 

compounds, polyketide compounds with potent antibiotic activity of significant 
interest due to activity against antibiotic resistant strains of bacteria. See 
Griesgraber et aL> 1996, ./. Antibiot. 49: 465-477, incorporated herein by 
reference. Most if not all of the ketolides prepared to date are synthesized using 

30 erythromycin A, a derivative of 6-dEB, as an intermediate. In one embodiment, 
the present invention provides the 3-keto derivatives of the megalomicins for use 
as antibiotics. In particular, the 3-keto derivative of megalomicin A is a preferred 
ketolide of the invention. These compounds can be made chemically, substantially 
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in accordance with the procedures for making ketolides described in the prior art, 
or in recombinant host cells of the invention in which the megosamine and 
desosamine biosynthetic and transferase genes are present but which do not make 
or transfer the mycarose moiety and/or the PKS has been modified to delete the 
5 ICR domain of extender module 6. The invention also provides methods for 
making intermediates useful in preparing traditional, 6-dEB- and erythromycin- 
derived ketolide compounds. See'Griesgraber et aL, supra, Agouridas et aL, 1998, 
J. Med Chem. 41: 4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 
5,747,467; 5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 

10 5,527,780; 5,444,051; 5,439,890; 5,439,889; and PCT publication Nos. WO 
98/09978 and 98/28316, each of which is incorporated herein by reference. 

As noted above, the hybrid PKS genes of the invention can be expressed in 
a host cell that contains the desosamine, megosamine, and/or mycarose 
biosynthetic genes and corresponding transferase genes as well as the required 

1 5 hydroxylase gene(s), which may, for example and without limitation, be either 
picK, megK, or eryK (for the C- 1 2 position) and/or megF oxeryF (for the C-6 
position). The resulting compounds have antibiotic activity but can be further 
modified, as described in the patent publications referenced above, to yield a 
desired compound with improved or otherwise desired properties. Alternatively, 

20 the aglycone compounds can be produced in the recombinant host cell, and the 

desired glycosylation and hydroxylation steps carried out in vitro or in vivo, in the 
latter case by supplying the converting cell with the aglycone, as described above. 

The compounds of the invention are thus optionally glycosylated forms of 
the polyketide set forth in formula (1) below which are hydroxylated at either the 

25 C-6 or the C-l 2 or both. The compounds of formula ( 1 ) can be prepared using the 
loading and the six extender modules of a modular PKS, modified or prepared in 
hybrid form as herein described. These polyketides have the formula: 
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including the glycosylated and isolated stereoisomer^ forms thereof; 

wherein R* is a straight chain, branched or cyclic, saturated or unsaturated 
substituted or unsubstituted hydrocarbyl of 1 -1 5C; 
5 each of R ! -R 6 is independently H or alkyl (1-4C) wherein any alkyl at R 1 

may optionally be substituted; 

each of X ! -X 5 is independently two H, H and OH, or =0; or 

each of X ] -X 5 is independently H and the compound of formula (2) 
contains a double-bond in the ring adjacent to the position of said X at 2-3, 4-5, 6- 
10 7, 8-9 and/or 10-11; 

with the proviso that: 

at least two of R ! -R 6 are alkyl (1 -4C). 

Preferred compounds comprising formula 2 are those wherein at least three 
of R*-R 5 are alkyl (1-4C), preferably methyl or ethyl; more preferably wherein at 

1 5 least four of R*-R 5 are alkyl (1-4C), preferably methyl or ethyl. Also preferred are 
those wherein X 2 is two H, =0, or H and OH, and/or X 3 is H, and/or X 1 is OH 
and/or X 4 is OH and/or X 5 is OH. Also preferred are compounds with variable R* 
when R ! -R 5 is methyl, X 2 is =0, and X 1 , X 4 and X 5 are OH. The glycosylated 
forms (i.e., mycarose or cladinose at C-3, desosamine at C-5, and/or megosamine 

20 at C-6) of the foregoing are also preferred. 

As described above, there are a wide variety of diverse organisms that can 
modify compounds such as those described herein to provide compounds with or 
that can be readily modified to have useful activities. For example, 
Saccharopolyspora erythraea can convert 6-dEB to a variety of useful 
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compounds. The compounds provided by the present invention can be provided to 
cultures of Saccharopolyspora eryihraea and converted to the corresponding 
derivatives of erythromycins A, B, C, and D in accordance with the procedure 
provided in the Examples, below. To ensure that only the desired compound is 
5 produced, one can use an S. erythraea eryA mutant that is unable to produce 6- 
dEB but can still carry out the desired conversions (Weber et ai, 1985, J. 
BacterioL J 64(1): 425-433). Also, one can employ other mutant strains, such as 
eryB, eryC, eryG, and/or eryK mutants, or mutant strains having mutations in 
multiple genes, to accumulate a preferred compound. The conversion can also be 

10 carried out in large fermentors for commercial production. Each of the 

erythromycins A, B, C, and D has antibiotic activity, although erythromycin A has 
the highest antibiotic activity. Moreover, each of these compounds can form, 
under treatment with mild acid, a C-6 to C-9 hemiketal with motilide activity. For 
formation of hemiketals with motilide activity, erythromycins B, C, and D, are 

1 5 preferred, as the presence of a C-12 hydroxy! allows the formation of an inactive 
compound that has a hemiketal formed between C-9 and C-12. 

Thus, the present invention provides the compounds produced by 
hydroxylation and glycosylation of the compounds of the invention by action of 
the enzymes endogenous to Saccharopolyspora erythraea and mutant strains of S. 

20 erythraea. Such compounds are useful as antibiotics or as motilides directly or 
after chemical modification. For use as antibiotics, the compounds of the 
invention can be used directly without further chemical modification. 
Erythromycins A, B, C, and D all have antibiotic activity, and the corresponding 
compounds of the invention that result from the compounds being modified by 

25 Saccharopolyspora erythraea also have antibiotic activity. These compounds can 
be chemically modified, however, to provide other compounds of the invention 
with potent antibiotic activity. For example, alkylation of erythromycin at the C-6 
hydroxyl can be used to produce potent antibiotics (clarithromycin is C-6-0- 
methyl), and other useful modifications are described in, for example, Griesgraber 

30 et al., 1996, J. AntihioL 49: 465-477, Agouridas et al. 3 1998, J. Med Chem. 41: 
4080-4100, U.S. Patent Nos. 5,770,579; 5,760,233; 5,750,510; 5,747,467; 
5,747,466; 5,656,607; 5,635,485; 5,614,614; 5,556,118; 5,543,400; 5,527,780; 
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5,444,051; 5,439,890; and 5,439,889; and PCT publication Nos. WO 98/09978 
and 98/28316, each of which is incorporated herein by reference. 

For use as motilides, the compounds of the invention can be used directly 
without further chemical modification. Erythromycin and certain erythromycin 
5 analogs are potent agonists of the motilin receptor that can be used clinically as 
prokinetic agents to induce phase III of migrating motor complexes, to increase 
esophageal peristalsis and LES pressure in patients with GERD, to accelerate 
gastric emptying in patients with gastric paresis, and to stimulate gall bladder 
contractions in patients after gallstone removal and in diabetics with autonomic 
10 neuropathy. See Peeters, 1 999, Motilide Web Site, http://www.med.kuleuven. 
ac.be/med/gih/motilid.htm, and Omura et ai, 1987, Macrolides with 
gastrointestinal motor stimulating activity, J. Med. Chem. 30: 1941-3). The 
corresponding compounds of the invention that result from the compounds of the 
invention being modified by Saccharopolyspora erythraea also have motilide 

1 5 activity, particularly after conversion, which can also occur in vivo, to the C-6 to 
C-9 hemiketal by treatment with mild acid. Compounds lacking the C-12 hydroxyl 
are especially preferred for use as motilin agonists. These compounds can also be 
further chemically modified, however, to provide other compounds of the 
invention with potent motilide activity. 

20 Moreover, and also as noted above, there are other useful organisms that 

can be employed to hydroxylate and/or glycosylate the compounds of the 
invention. As described above, the organisms can be mutants unable to produce 
the polyketide normally produced in that organism, the fermentation can be carried 
out on plates or in large fermentors, and the compounds produced can be 

25 chemically altered after fermentation. In addition to Saccharopolyspora erythraea, 
Streptomyces venezuelae, S. narbonensis, S. antibioticus, Micromonospora 
megalomicea, S. fradiae, and S. thermotolerans can also be used. In addition to 
antibiotic activity, compounds of the invention produced by treatment with M. 
megalomicea enzymes can have antiparasitic activity as well. Thus, the present 

30 invention provides the compounds produced by hydroxylation and glycosylation 
by action of the enzymes endogenous to S. erythraea, S. venezuelae, S. 
narbonensis, S. antibioticus, M. megalomicea, S. fradiae, and S. thermotolerans. 
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The present invention also provides methods and genetic constructs for 
producing the glycosylated and/or hydro xylated compounds of the invention 
directly in the host cell of interest. Thus, the recombinant genes of the invention, 
which include recombinant megAl, megAII, and megAUI genes with one or more 
5 deletions and/or insertions, including replacements of a megA gene fragment with 
a gene fragment from a heterologous PKS gene, can be included on expression 
vectors suitable for expression of the encoded gene products in 
Saccharopolyspora erythraea, Micromonospora megalomicea, S. venezuelae, S. 
narbonensis, S. antibioticus, S. fradiae, and S. thermotolerans. 

1 0 The compounds of the invention can be produced by growing and 

fermenting the host cells of the invention under conditions known in the art for the 
production of other polyketides. The compounds of the invention can be isolated 
from the fermentation broths of these cultured cells and purified by standard 
procedures. The compounds can be readily formulated to provide the 

15 pharmaceutical compositions of the invention. The pharmaceutical compositions 
of the invention can be used in the form of a pharmaceutical preparation, for 
example, in solid, semisolid, or liquid form. This preparation will contain one or 
more of the compounds of the invention as an active ingredient in admixture with 
an organic or inorganic carrier or excipient suitable for external, enteral, or 

20 parenteral application. The active ingredient may be compounded, for example, 
with the usual non-toxic, pharmaceutical^ acceptable carriers for tablets, pellets, 
capsules, suppositories, solutions, emulsions, suspensions, and any other form 
suitable for use. 

The carriers which can be used include water, glucose, lactose, gum acacia, 
25 gelatin, mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, 
colloidal silica, potato starch, urea, and other carriers suitable for use in 
manufacturing preparations, in solid, semi-solid, or liquified form. In addition, 
auxiliary stabilizing, thickening, and coloring agents and perfumes may be used. 
For example, the compounds of the invention may be utilized with hydroxypropyl 
30 methylcellulose essentially as described in U.S. Patent No. 4,916,138, 

incorporated herein by reference, or with a surfactant essentially as described in 
EPO patent publication No. 428,169, incorporated herein by reference. 
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Oral dosage forms may be prepared essentially as described by Hondo et 
aL, 1987, Transplantation Proceedings XIX, Supp. 6: 17-22, incorporated herein 
by reference. Dosage forms for external application may be prepared essentially as 
described in EPO patent publication No. 423,714, incorporated herein by 
5 reference. The active compound is included in the pharmaceutical composition in 
an amount sufficient to produce the desired effect upon the disease process or 
condition. 

For the treatment of conditions and diseases caused by infection, a 
compound of the invention may be administered orally, topically, parenterally, by 

10 inhalation spray, or rectally in dosage unit formulations containing conventional 
non-toxic pharmaceutical^ acceptable carriers, adjuvant, and vehicles. The term 
parenteral, as used herein, includes subcutaneous injections, and intravenous, 
intramuscular, and intrasternal injection or infusion techniques. 

Dosage levels of the compounds of the invention are of the order from 

1 5 about 0.0 1 mg to about 50 mg per kilogram of body weight per day, preferably 
from about 0.1 mg to about 10 mg per kilogram of body weight per day. The 
dosage levels are useful in the treatment of the above-indicated conditions (from 
about 0.7 mg to about 3.5 mg per patient per day, assuming a 70 kg patient). In 
addition, the compounds of the invention may be administered on an intermittent 

20 basis, i.e., at semi-weekly, weekly, semi-monthly, or monthly intervals. 

The amount of active ingredient that may be combined with the carrier 
materials to produce a single dosage form will vary depending upon the host 
treated and the particular mode of administration. For example, a formulation 
intended for oral administration to humans may contain from 0.5 mg to 5 gm of 

25 active agent compounded with an appropriate and convenient amount of carrier 
material, which may vary from about 5 percent to about 95 percent of the total 
composition. Dosage unit forms will generally contain from about 0.5 mg to about 
500 mg of active ingredient. For external administration, the compounds of the 
invention may be formulated within the range of, for example, 0.00001% to 60% 

30 by weight, preferably from 0.001% to 1 0% by weight, and most preferably from 
about 0.005% to 0.8% by weight. 

It will be understood, however, that the specific dose level for any 
particular patient will depend on a variety of factors. These factors include the 
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activity of the specific compound employed; the age, body weight, general health, 
sex, and diet of the subject; the time and route of administration and the rate of 
excretion of the drug; whether a drug combination is employed in the treatment; 
and the severity of the particular disease or condition for which therapy is sought. 
5 A detailed description of the invention having been provided above, the 

following examples are given for the purpose of illustrating the invention and shall 
not be construed as being a limitation on the scope of the invention or claims. 

Example 1 

10 Cloning and Characterization of the Megalomicin Biosynthetic Gene Cluster from 

Micromonosvora meglomicea 
Experimental Procedures 
Bacterial Strains, Media, and Growth Conditions 

Routine DNA manipulations were performed in Escherichia coli XL1 Blue 

15 or E. coli XL1 Blue MR (Stratagene) using standard culture conditions (Sambrook 
et aL, 1989). M megalomicea subs, nigra NRRL3275 was obtained from the 
ATCC collection and cultured according to recommended protocols. For isolation 
of genomic DNA, M. megalomicea was grown in TSB (Hopwood et al, 1985) at 
30 °C. S. lividans K4-1 14 (Ziermann and Betlach, 1999), which carries a deletion 

20 of the actinorhodin biosynthetic gene cluster, was used as the host for expression 
of the megAl-AIH genes. S. lividans strains were maintained on R5 agar at 30°C 
and grown in liquid YEME for preparation of protoplasts (Hopwood et al., 1985) . 
S. erythraea NRRL2338 was used for expression of the megosamine genes. S. 
erythraea strains were maintained on R5 agar at 34°C and grown in liquid TSB for 

25 preparation of protoplasts. 

Manipulation of DNA and Organisms 

Manipulation and transformation of DNA in E. coli was performed by 
standard procedures (Sambrook et al. y 1989) or by suppliers protocols. Protoplasts 
30 of S. lividans and & erythraea were generated for transformation by plasmid DNA 
using the standard procedure. S. lividans transformants were selected on R5 using 
2 ml of a 0.5 mg/ml thiostrepton overlay. S. erythraea transformants were selected 
on R5 using 1.5 ml of a 0.6 mg/ml apramycin overlay. 
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Isolation of the meg gene cluster 

A cosmid library was prepared in SuperCos (Stratagene) from M. 

megalomicea total DNA partially digested with Sau3A I, and introduced into E. 
5 coli using a Gigapack III XL (Stratagene) in-vitro packaging kit. 32 P-labelled DNA 

probes encompassing the KS2 domain from ery DEBS, or a mixture of segments 

encompassing modules 1 and 2 from ery DEBS were used separately to screen the 

cosmid library by colony hybridization. Several colonies which hybridized with 

the probes were further analyzed by sequencing the ends of their cosmid inserts 
1 0 using T3 and T7 primers. BLAST (Altschul et al. , 1 990) analysis of the sequences 

revealed several colonies with DNA sequences highly homologous to genes from 

the ery cluster. Together with restriction analysis, this led to the isolation of two 

overlapping cosmids, pKOS079-93A and pKOS079-93D which covered -45 kb of 

the meg cluster. A 400 bp PCR fragment was generated from the left end of and 
1 5 pKOS079-93D and used to reprobe the cosmid library. Likewise, a 200 bp PCR 

fragment generated from the right end of pKOS079-93 A was used to reprobe the 

cosmid library. Analysis of hybridizing colonies as described above resulted in 

identification of two additional cosmids, pKOS079-138B and pKOS79-124B 

which overlap the previous two cosmids. BLAST analysis of the far left and right 
20 end sequences of these cosmids indicated no homology to any known genes 

related to polyketide biosynthesis and therefore indicates that the set of four 

cosmids spans the entire megalomicin biosynthetic gene cluster. 

DNA sequencing and analysis 
25 PCR-based double stranded DNA sequencing was performed on a 

Beckman CEQ 2000 capillary sequencer using reagents and protocols provided by 

the manufacturer. A shotgun library of the entire cosmid pKOS079-93D insert was 

made as follows: DNA was first digested with Dra 1 to eliminate the vector 

fragment, then partially digested with Saul A I. After agarose electrophoresis, 
30 bands between 1 -3 kb were excised from the gel and ligated with BamH I digested 

pUC19. Another shotgun library was generated from a 12 kb Xho VEcoR I 

fragment subcloned from cosmid pKOS079-93A to extend the sequence to the | 
megF gene. A 4 kb Bgl II/ Xho I fragment from cosmid pKOS079- 1 38B was 1 
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sequenced by primer walking to extend the sequencing to the megT gtne. 
Sequence was assembled using Sequencher (Gene Codes Corp.) software package 
and analyzed with MacVector (Oxford Molecular Group) and the NCBI BLAST 
server (www.ncbi.nlm.nih.gov/BLAST/). 

5 

Plasmids 

Plasmid pKOS108-6 is a modified version of pKA0127'kan' (Ziermann 
and Betlach, 1999; Ziermann and Betlach, 2000) in which the eryAl-lll genes 
between the Pac I and EcoR I sites have been replaced with the megAl-lll genes. 
10 This was done by first substituting a synthetic nucleotide DNA duplex (5'- 
TAAGAATTCGGAGATCTGGCCTCAGCTCTAGAC (SEQ ID NO: 21), 
complementary oligo 5'- 

AATTGTCTAGAGCTGAGGCCAGATCTCCGAATTCTTAAT (SEQ ID NO: 

22)) between the Pac I and EcoR I sites of the pKA0127'kan' vector fragment. 
1 5 The 22 kb EcoR l/Bgl II fragment from cosmid pKOS079-93D containing the 

megAI-II genes was inserted into EcoR I and Bgl II sites of the resulting plasmid to 

generate pKOS024-84. A 12 kb Bgl IVBbvC I fragment containing the megAIII 

and part of the megCll gene was subcloned from pKOS079-93A and excised as a 

Bgl IVXba I fragment and ligated into the corresponding sites of pKOS024-84 to 
20 yield the final expression plasmid pKOS 108-06. 

The megosamine integrating vector, pKOS97-42, was constructed as 

follows: A subclone was generated containing the 4 kb Xho MSca I fragment from 

pKOS79-138B together with the 1 .7 kb Sea VPst I fragment from pKOS79-93D in 

Litmus 28 (Stratagene). The entire 5.7 kb fragment was then excised as a Spe VPst 
25 I fragment and combined with the 6.3 kb Pst VEcoR I fragment from KOS79-93D 

and EcoR VXba I digested pSET152 (Bierman et ai 9 1992) to construct plasmid 

pKOS97-42. 

Production and analysis of secondary metabolites 
30 Fermentation for production of polyketide, LC/MS analysis, and 

quantification of 6-dEB for S. lividans K4-1 14/pKOSl08-6 and S. lividans K4- 
1 14/pKA0127'kan' were essentially as previously described (Xue et al, 1999). S. 
erythraea NRRL2338 and 5. erythraea/pKOS97A2 were grown for 6 days in Fl 
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media (Briinker et al. 9 1998). Samples of broth were clarified in a microcentrifuge 
(5 min, 13,000 rpm). For LC/MS preparation, isopropanol was added to the 
supernatant (1 :2 ratio) and centrifuged again. Erythromycins and megalomicins 
were detected by electrospray mass spectrometry and quantity was determined by 
5 evaporative light scattering detection (ELSD). The LC retention time and mass 
spectra of erythromycin and megalomicins were identical to known standards. 

Nucleotide sequence of the meg gene cluster 

A series of 4 overlapping inserts containing the meg cluster (Figure 9) were 

0 isolated from a cosmid library prepared from total genomic DNA of M. 

megalomicea and covers > 100 kb of the genome. A contiguous 48 kb segment 
which encodes the megalomicin PKS and several deoxysugar biosynthetic genes 
was sequenced and analyzed. The segment contains 17 complete ORFs as well as 
an incomplete ORF at each end, organized as shown in Figure 9. 

5 PKS genes. The ORFs megAI, megA II and megAJII encode the polyketide 

synthase responsible for synthesis of 6-dEB. The enzyme complex, meg DEBS, is 
highly similar to ery DEBS, with each of the three predicted polypeptides sharing 
an average of 83% overall similarity with their ery PKS counterpart. Both PKSs 
are composed of 6 modules (2 modules per polypeptide) and each module is 

0 organized in the identical manner (Figure 9). A dendrogram analysis (Schwecke et 
ai, 1995) employing 70 acyltranferase (AT) domains revealed that the 6 meg 
extender AT domains cluster with AT domains that incorporate methylmalonyl 
CoA (not shown). The loading module of meg DEBS also lacks a KS Q domain 
which is utilized by most macrolide PKSs for decarboxylation of the starter unit to 

5 initiate polyketide synthesis (Bisang et ai , 1 999; Kuhstoss et ai , 1 996; Kakavas et 
ai, 1997; Xue et ai, 1998), implying that priming begins with a propionate unit. 
In addition, a conserved Gly to Pro substitution in the NADPH-binding region of 
the ketoreductase (KR) domain of module 3 is observed in meg DEBS, which has 
been proposed to account for its inactivity in ery DEBS (Donadio et ai, 1991). 

0 Deoxysugar genes. BLAST ( Altschul et ai , 1 990) analysis of the genes 

flanking the PKS indicated that 12 complete ORFs and 1 partial ORF appear to 
encode functions required for synthesis of one of the three megalomicin 
deoxysugars. Assignment of each ORF to a specific deoxysugar pathway was 
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made based on comparison to the ery genes and other related genes involved in 
deoxysugar biosynthesis (Table 2). 

Table 2. Deduced functions of genes identified in the megalomicin gene cluster. 



Gene 


Closest Match 


%Sim° 


Proposed 


Proposed Function 


Reference 




(polypeptide) 3 




Pathway 






megT 


EryBVI 




Mycarose/ 


2,3 -Dehydratase 


(Summers et ai. t 1997; 








Megosamine 




Gaissere/a/., 1997) 


megDVl 


EryCIl 


63 


Megosamine 


3,4-Isomerase 


(Summers eiai, 1997) 


megDl 


EryCIH 


79 


Megosamine 


C iycosyltransferase 


(Summers etai, 1997) 


megY 


AcyA (S. 


52 




Mycarose 0-acyl- 


(Arisawa etai, 1994) 




thermotolerans) 






transferase 




megDl! 


EryCI 


58 


Megosamine 


Aminotransferase 


(Dhillon etai, 1989; 












Summers etai, 1997) 


megDIU 


DesVl (5. 


61 


Megosamine 


Dimethy [transferase 


(Xue */<*/., 1998) 




venezuelae) 










megDIV 


DmnU (S. 


65 


Megosamine 


3,5-Epimerase 


(Olanoera/., 1999) 




peucetius) 










megDV 


Dehydrogenase 


61 


Megosamine 


4-Ketoreductase 


(Summers et ai, 1997; van 




{A. oriental is) 








Wageningen et ai } 1998) 


megDVU 


EryBII 


73 


Megosamine 


2,3-Reductase 


(Summers etai, 1997) 


megDV 


EryBV 


86 


Mycarose 


G Iycosyltransferase 


(Summers etai, 1997; 












Gaissere* ai, 1997) 


megBlV 


EryBIV 


80 


Mycarose 


4-Ketoreductase 


(Summers etai, 1997; 












Gaisser<?/a/., 1997) 


megAl 


EryAI 


81 


6-dEB 


Polyketide Synthase 


(Donadioand Katz, 1992) 


megAll 


EryAII 


85 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megAUl 


EryAHI 


83 


6-dEB 


Polyketide Synthase 


(Donadio and Katz, 1992) 


megCll 


EryCIl 


82 


Desosam ine 


3,4-Isomerase 


(Summers etai, 1997) 


meg CI II 


EryCm 


89 


Desosamine 


G lycosy ly Itransferase 


(Summers etai, 1997) 


megBIl 


EryBII 


87 


Mycarose 


2,3-Reductase 


(Summers etai, J 997) 


megH 


EryH 


84 




Thioesterase 


(Haydock et ai, 1991) 


megF 


EryF 






C-6 Hydroxylase 


(Weber etai, 1991) 



5 a. Determined by BLASTX analysis using default parameters. 
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Three ORFs, megBV, megCIII and megDl, encode glycosyltransferases, 
apparently one for attachment of each deoxysugar to the macrolide. MegBV was 
most similar to EryBV, the erythromycin mycarosyltransferase, and hence was 
assigned to the mycarose pathway in the meg cluster. The closest match for both of 

5 the remaining glycosyltransferases was EryCIII, the desosaminyltransferase in 
erythromycin biosynthesis. Given the higher degree of similarity between EryCIII 
and MegCIII (Table 2), MegCIII was designated the desosaminyltransferase, 
leaving MegDI as the proposed megosaminyltransferase. In similar fashion, 
assignments were made accordingly for; MegCII and MegDVI, two putative 3,4- 

0 isomerases similar to EryCII; MegBII and MegDVII, 2,3-reductases homologous 
to EryBII; MegBIV and MegDV, putative 4-ketoreductases similar to EryBIV 
(Table 2). The remaining ORFs involved in deoxysugar biosynthesis, megT, 
megDJI, megDHI and megDIV, each encode a putative 2,3-dehydratase, 
aminotransferase, dimethyltransferase and 3,5-epimerase, respectively (Table 2). 

5 Since both the megosamine and desosamine pathways require an aminotransferase 
and a dimethyltransferase, and since mycarose and megosamine each require a 
2,3-dehydratase and a 3,5-epimerase, assignments of these four genes to a specific 
pathway could not be made on the basis of sequence comparison alone. However, 
the latter three are implicated in megosamine biosynthesis by experiments 

0 described below. 

Other genes. Two additional complete ORFs, designated megY and megH 
and an incomplete ORF, designated megF, were also identified in the cluster. 
MegH and MegF share high degrees of similarity with EryH and EryF. EryH and 
homologs in other macrolide gene clusters are thioesterase-like proteins with 

5 unknown function in polyketide gene clusters (Haydock et aL, 1 99 1 ; Xue et aL , 
1998; Butler et aL, 1999; Tang et al, 1999). EryF encodes the erythronolide B C-6 
hydroxylase (Figure 8) (Weber et aL, 1 991 ; Andersen and Hutchinson, 1992). 
MegY does not have an ery counterpart but appears to belong to a (small) family 
of O-acyltransferases that transfer short acyl chains to macrolides. Two classes 

0 exist: AcyA and MdmB transfer acetyl or propionyl groups to the C-3 hydroxyls 
on 16-membered macrolide rings (Arisawa et aL, 1994; Hara and Hutchinson, 
1 992); CarE and Mpt transfer isovalerate or propionate to the mycarosyl moiety of 
carbomycin and midecamycin, respectively (Epp et aL, 1989; Arisawa et aL, 1993; 
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Gu et ai 9 1996). The structures of various megalomicins suggest that MegY 
belongs to the latter class and is the acyltransferase which converts megalomicin A 
to megalomicins B, CI , or C2 (verified experimentally below). 

5 Heterologous expression of the meg PKS genes. 

The wild type and genetically modified versions of the ery DEBS have 
been used extensively in heterologous Streptomyces hosts for enzyme studies and 
the production of novel polyketide compounds. Given the similarities between the 
ery and meg DEBSs, production characteristics were compared in a commonly 

10 used Streptomyces host strain. The three meg A ORFs were cloned into the 

expression plasmid pKAOnT'kan 5 (Ziermann and Betlach, 1999) in place of the 
eryA ORFs. Both plasmids, pKAOm'kan' encoding ery DEBS and pKOS108-06 
encoding meg DEBS, were introduced in Streptomyces lividans K4-1 14 and the 
production of 6-dEB was determined in shake-flask fermentations. The production 

1 5 profiles were similar in both cases and the maximum titer of 6-dEB was between 
30-40 mg/L. In addition, both PKSs produced small amounts (-5%) of 8,8a- 
deoxyoleandolide, which results from the priming of the PKS with acetate instead 
of propionate (Kao et al, 1994b). This observation indicates that the loading AT 
domains of the PKSs display similar relaxed specificities towards starter units. 

20 

Conversion of erythromycin to megalomicin in S. erythraea. 

An examination of the meg cluster revealed that the putative megosamine 
biosynthetic genes are clustered directly upstream of the PKS genes. If the 
hypothesis that these genes are sufficient for biosynthesis and attachment of 

25 megosamine to an erythromycin intermediate is correct, then functional expression 
of these genes in a strain which produces erythromycin, such as S. erythraea, 
should result in production of megalomicin. A 12 kb DNA fragment carrying all 
the genes between the leftmost Xhol site and the EcoTU site (Figure 9) was 
integrated in the chromosome of S. erythraea using the site-specific integrating 

30 vector pSETl 52 (Bierman et al., 1 992). It was surmised that the left and right ends 
of this fragment would contain necessary promoter regions for transcription of the 
convergent set of genes in M. megalomicea and that they would likely operate in 
S. erythraea. 
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Fermentation broth from S. erythraea/KOS97A2 y which contains the 
integrated meg genes, was analyzed by LC/MS and compared to LC/MS profiles 
of the parent S. erythraea strain without the meg genes, as well as to megalomicin 
standards purified from M. megalomicea. The new strain was found to produce a 

5 mixture of erythromycin A and various megalomicins (~4: 1 ratio), thereby 

showing that the predicted megosamine biosynthetic and glycosyltransferase genes 
are contained within the cloned meg fragment. The two most abundant congeners 
identified were megalomicins B and CI. Megalomicin A and C2 were also 
detected in smaller amounts. The presence of the megalomicins B, CI and C2 also 

0 provides direct evidence for the function of the (9-acyl transferase, MegY, which 
is present in the integrated meg fragment. 

Discussion 

The homologies observed among modular PKSs enabled the use of ery 
5 PKS genes to clone the meg biosynthetic gene cluster from M. megalomicea. The 
close similarities between the megalomicin and erythromycin biosynthetic 
pathways is also reflected in the overall organization of their genes and in the high 
degree of homology of the corresponding individual gene-encoded polypeptides. 
Production of 6-dEB from meg DEBS in S. lividans and conversion of 
0 erythromycin to megalomicin using the megD genes in S. erythraea provides 
direct evidence that the identified gene cluster is responsible for synthesis of 
megalomicin. 

As seen in Figure 9, the ~ 40 kb segments of the two clusters beginning 
with ery/megBV on the left through the ery/megF genes retain a nearly identical 

:5 organizational arrangement. The notable differences in this region are eryG and 
ISJJ36 which are absent from the segment of the meg cluster analyzed. The eryG 
gene encodes an S-adenosylmethionine (SAM)-dependent mycarosyl 
methyltransferase that converts erythromycin C to erythromycin A (Figure 8) 
(Weber el ai, 1990; Haydock et aL, 1991). The mycarose moiety is modified by 

0 esterification (MegY) in megalomicin biosynthesis (Figure 8) and, therefore, the 
absence of an eryG homolog would be expected in the meg cluster. The \SJJ36 
element located between eryAI and eryAII (Donadio and Staver, 1993) is not 
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known to play a role in erythromycin biosynthesis and its origin in the ery cluster 
has not been determined. 

Upstream of the common meg/eryBIV and BV genes, the gene clusters 
diverge. The - 6 kb segment between eryBV and eryK, the left border of the ery 
5 gene cluster (Pereda et al., 1997), contains the remaining genes required for 

mycarose (eryBVI and BV1T) and desosamine biosynthesis (eryCIV, CV, and CVI) 
and the C-12 hydroxylase {eryK) (Stassi et al., 1993). In contrast, the region 
upstream of megBV encodes a set of genes (megDI-DVII and megY) which can 
account for all the activities unique to megalomicin biosynthesis (Figure 9). Since 

10 introduction of this meg DNA segment into S. erythraea results in production of 
megalomicins, it is clear that these genes encode the functions for TDP- 
megosamine biosynthesis and transfer to its putative substrate erythromycin C, and 
to acylate megalomicin A (Figure 8). The remaining region upstream of megDVI 
should therefore encode genes only for mycarose and desosamine biosynthesis. 

1 5 Olano et aL (Olano et al, 1999) have recently described a pathway for 

biosynthesis of TDP-L-daunosamine, a deoxysugar component of the antitumor 
compounds daunorubicin and doxorubicin produced by Streptomyces peucetius. 
Their pathway proposes four steps from the intermediate TDP-4-keto-6- 
deoxyglucose controlled by the gene cluster dnmJQTUVZ, although the functions 

20 for dnmQ and dnmZ could not be identified and the precise order of reactions in 
the pathway could not be determined. The genes dnmT, dnmU, dnmJ and dnmV 
each have proposed counterparts in the meg cluster, megT, megDIV, megDII, and 
megDV, respectively (see Figure 10) 

It is possible to describe a pathway to convert TDP-2,6-dideoxy-3,4- 

25 diketo-D-hexose (or its enol tautomer), the last intermediate common to the 

mycarose and megosamine pathways, to TDP-megosamine through the sequence 
of 5-epimerization, 4-ketoreduction, 3-amination, and 3-N-dimethylation 
employing the genes megDIV, megDV, megDll, and megDIH. This employs the 
same functions proposed for biosynthesis of TDP-daunosamine by Olano et aL, 

30 but in a different sequential order. However, it does not account for the megDVI 
and megDVII genes since their activities are not required for this route. A parallel 
pathway which employs these genes is also shown in Figure 10. In this alternate 
route, 2,3-reduction and 3,4-tautomerization are performed by the megDVII and 
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megDVI gene products, respectively. A unified single pathway that employs both 
4-ketoreduction (megDV) and 2,3-reduction (megDVII) could not be determined. 
Because the entire gene set from megDVI through megDVlI was introduced in S. 
erythraea to produce TDP-megosamine, it is not possible to determine which, if 

5 either, of the two alternative pathways is operative, but this can be addressed 
through systematic gene disruption and complementation. 

The 48 kb segment sequenced also contains genes required for synthesis of 
TDP-L-mycarose and TDP-D-desosamine (Fig 10). For the latter, megCII, which 
encodes a putative 3,4-isomerase, the first step in the committed TDP-desosamine 

0 pathway, appears to be translationally coupled to megAIII, almost exactly as its 
erythromycin counterpart, eryCII, was found translationally coupled to eryAIH 
(Summers et al. 9 1997). The high degree of similarity between MegCII and EryCII 
suggests that the pathway to desosamine in the megalomicin- and erythromycin- 
producing organisms are most likely the same. Similarly, the finding that megBH 

5 and megBIV, encoding a 2,3-reductase and 4-ketoreductase, contain close 

homologs in the mycarose pathway for erythromycin also suggests that TDP-L- 
mycarose synthesis in the two host organisms is the same. 

Of interest are the two genes that encode putative 2, 3 -reductases, megBH 
and megDVII. Because MegBH most closely resembles EryBII, a known mycarose 

0 biosynthetic enzyme (Weber et al. , 1 990), and because megBH resides in the same 
location of the meg cluster as its counterpart in the ery cluster, megBH is assigned 
to the mycarose pathway and megDVII to the megosamine pathway. Furthermore, 
the lower degree of similarity between MegDVII and either EryBII or MegBIl 
(Table 2) provides a basis for assigning the opposite L and D isomeric substrates 

5 to each of the enzymes (Figure 10). Finally, megT, which encodes a putative 2,3- 
dehydratase, is also related to a gene in the ery mycarose pathway, eryBVL In S. 
erythraea, the proposed intermediate generated by EryBVI represents the first 
committed step in the biosynthesis of mycarose (Figure 10). However, the 
proposed pathways in Figure 10 suggest this may be an intermediate common to 

0 both mycarose and megosamine biosynthesis in M megalomicea. Therefore, megT 
is named following the designation of the equivalent gene in the daunosamine 
pathway, clnmT (Olano et ai 9 1999) 
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The preferred host-vector system for expression of meg DEBS described 
here has been used previously for the heterologous expression of modular PKS 
genes from the erythromycin (Kao et ai, 1994a; Ziermann and Betlach, 1999), 
picromycin (Tang et ai, 1999) and oleandomycin pathways, as well as for the 
5 generation of novel polyketide backbones where domains have been removed, 
added or exchanged in various combinations (McDaniel et ai, 1999). Recently, 
hybrid polyketides have been generated through the co-expression of subunits 
from different PKS systems (Tang et ai, 2000). 

Expression of the megDVI-megDVII segment in S. erythraea and the 

1 0 corresponding production of megalomicins in this host establishes the likely order 
of sugar attachment in megalomicin synthesis. Furthermore, it provides a means to 
produce megalomicin in a more genetically friendly host organism, leading to the 
creation of megalomicin analogs by manipulating the PKS. Over 60 6-dEB 
analogs have been produced by combinatorial biosynthesis using the ery PKS 

1 5 (McDaniel et ai , 1 999; Xue et ai , 1 999). The titers of megalomicin could also be 
significantly increased above the 5 mg/L obtained from M megalomiciea by 
introducing the genes into an industrially optimized strain of S. erythraea, many of 
which can produce as much as 10 g/L of erythromycin. 
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Example 2 

Stabilizing meg PKS Expression Plasmid by Codon Engineering 

30 Materials and methods 

All bacterial strains were cultured and transformed as described in 
Example 1 . 
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Fermentation of Streptomyces and dike tide feeding 

Primary Streptomyces transformants were picked and placed in 6 mL of 
TSB liquid medium with 50 \xgfL of thiostrepton and grown at 30°C. When the 
5 culture showed some growth (3-4days), it was transferred into a 250 mL flask 

containing 50 mL of R6 medium (pH 7.0) with 25 ug/L of thiostrepton and 1 g/L of 
diketide ((2s,3R)2-methyl-3-hydroxyhexanoate N-propionyl cysteamine thioester) 
and placed in a 30°C incubator for 7 days. 



1 0 Changing codons and making plasmids 

There are several identical sequences in the coding sequences for module 2 
and module 6 of the megaiomicin PKS gene cluster. Expression plasmids 
containing the full length megaiomicin PKS appeared to be somewhat unstable 
and subject to deletion in recA + strains like ET 124567 and Streptomyces by intra- 

1 5 plasmid homologous recombination. To prevent significant homologous 

recombination and so stabilize expression plasmids, the codons of two regions of 
the module 6 coding sequence that are identical to regions in the module 2 coding 
sequence were changed without changing the sequence of protein encoded. The 
two regions changed in module 6 were from the 26739 th base to 27,267 lh base and 

20 from position 27,697 th base to 27,987 th base, which were identical to the region 
from position 6810 th base to7338 lh base and regions from position 7778 th base to 
8068 th base, respectively. The start codon of the loading domain of the meg PKS 
was set to be the 1 st base. These sequences are shown below 



25 > 6810-7338 Sequence in Module 2 

TTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

30 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 23) 

35 > 26736-27267 Sequence in Module 6 

CTGCAGCGGTTGTCGGTGGCGGTGCGGGAGGGGCGTCGGGTGTTGGGTGTGGTGGTGGGT 
TCGGCGGTGAATCAGGATGGGGCGAGTAATGGGTTGGCGGCGCCGTCGGGGGTGGCGCAG 
CAGCGGGTGATTCGGCGGGCGTGGGGTCGTGCGGGTGTGTCGGGTGGGGATGTGGGTGTG 
GTGGAGGCGCATGGGACGGGGACGCGGTTGGGGGATCCGGTGGAGTTGGGGGCGTTGTTG 

40 GGGACGTATGGGGTGGGTCGGGGTGGGGTGGGTCCGGTGGTGGTGGGTTCGGTGAAGGCG 
AATGTGGGTCATGTGCAGGCGGCGGCGGGTGTGGTGGGTGTGATCAAGGTGGTGTTGGGG 
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TTGGGTCGGGGGTTGGTGGGTCCGATGGTGTGTCGGGGTGGGTTGTCGGGGTTGGTGGAT 
TGGTCGTCGGGTGGGTTGGTGGTGGCGGATGGGGTGCGGGGGTGGCCGGTGGGTGTGGAT 
GGGGTGCGTCGGGGTGGGGTGTCGGCGTTTGGGGTGTCGGGGACGAAT (SEQ ID NO: 24) 
> 26736-27267 Sequence with- Codon Changes 
5 CTGCAGCGCCTCTCCGTCGCCGTCCGCGAGGGCCGCCGAGTCCTCGGCGTCGTCGTCGGC 
TCGGCCGTCAACCAAGACGGCGCGTCAAACGGCCTCGCCGCGCCCTCCGGCGTCGCCCAG 
CAGCGCGTCATACGCCGCGCGTGGGGACGCGCCGGAGTATCGGGCGGCGACGTCGGAGTC 
GTCGAGGCCCACGGCACCGGCACCCGCCTCGGGGATCCCGTCGAGCTGGGCGCCCTCCTG 
GGCACGTACGGCGTCGGCCGCGGCGGCGTCGGCCCGGTCGTCGTCGGCAGCGTCAAGGCC 
10 AACGTCGGCCACGTCCAGGCCGCGGCCGGCGTCGTCGGGGTCATCAAGGTCGTCCTCGGC 
CTCGGCCGCGGGCTGGTCGGCCCGATGGTCTGCCGCGGCGGCCTCAGCGGCCTCGTCGAC 
TGGTCGTCCGGCGGCCTGGTCGTCGCGGACGGGGTCCGCGGCTGGCCGGTCGGCGTCGAC 
GGCGTCCGCCGGGGCGGCGTCTCGGCGTTCGGCGTCAGCGGGACGAAT (SEQ ID NO: 25) 



15 > 6978-7337 Sequence in Module 2 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 

20 TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
26) 

> 27697-27987 Sequence in Module 6 

GGTGGAGTGTGATGCGGTGGTGTCGTCGGTGGTGGGGTTTTCGGTGTTGGGGGTGTTGGA 
GGGTCGGTCGGGTGCGCCGTCGTTGGATCGGGTGGATGTGGTGCAGCCGGTGTTGTTCGT 
25 GGTGATGGTGTCGTTGGCGCGGTTGTGGCGGTGGTGTGGGGTTGTGCCTGCGGCGGTGGT 
GGGTCATTCGCAGGGGGAGATCGCGGCGGCGGTGGTGGCGGGGGTGTTGTCGGTGGGTGA 
TGGTGCGCGGGTGGTGGCGTTGCGGGCGCGGGCGTTGCGGGCGTTGGCCGG (SEQ ID NO: 
27) 

> 27697-27987 Sequence with Codon Changes 

30 CGTGGAGTGCGATGCGGTCGTGTCGAGCGTCGTCGGCTTCAGCGTGCTGGGCGTCCTGGA 
GGGCCGCAGCGGCGCCCCGAGCCTGGACCGCGTCGACGTGGTCCAGCCGGTCCTGTTCGT 
GGTCATGGTCAGCCTGGCCCGCCTGTGGCGCTGGTGCGGCGTGGTCCCGGCCGCCGTGGT 
CGGCCACAGCCAGGGCGAGATCGCCGCCGCGGTCGTGGCCGGCGTCCTGAGCGTCGGCGA 
CGGCGCCCGCGTCGTGGCCCTGCGCGCCCGCGCCCTGCGCGCCCTGGCCGG (SEQ ID NO: 

35 28) 



Three pieces of DNA from the two regions above were synthesized and verified by 
Retrogen, and the synthesized DNAs were cloned into pCR-Blunt II -TOPO, as 
shown in the Table 3 below. 

40 



Table 3. Plasmids containing synthesized DNA 



Plasmids 


Cloning sites and positions in meg PICS 


pKOS97-1613 


Pstl-BamHI, 26,739 th -26,947 lh base 


PKOS97-1622 


BamHl-Bsml, 26,947 th -27,267 th base 


PKOS97-1628 


SfaNI-Fsel, 27,697 th - 27,987 th base 



Assembly of the expression plasmid 

First, ligation of the Pstl-BamHI fragment of pKOS97-1613, the BamHI- 
45 Bsml fragment of pKOS97-1622 and BsmT-PstI linearized pKOS97-90 produced 
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pKOS97-151. Then, the insertion of the SfaNI-Fsel fragment of pKOS97-t628 
into pKOS97-151 gave rise to pKS097-152. Then, the Pstl-BlpI fragment of 
pKOS97-125 was used to replace the Pstl-Blpl fragment of pKOS97-90a and 
produced pKOS97-160. 
5 The final expression plasmid (in pRM5) pKOS97-162 was the result of 

Bglll-Nhel fragment of pKOS97-160 inserted into Bglll-Nhel sites of pKOS108- 
04. 

Another expression plasmid pKOS97-152a was made by a four-fragment 
ligation. The four fragments were a Blpl-Xbal fragment (containing a cos site) of 
10 pKOS97-92a, a Bglll-PstI fragment of pKOS97-81, a Pstl-Blpl fragment of 
pKOS97-152, and a Bglll-Xbal fragment of pKOS 108-04 (as the vector). 

Tests of the constructed plasmids showed that the plasmids containing the 
modified coding sequences were more stable than plasmids containing unmodified 
coding sequence. 

15 

Example 3 
Construction of Ole-Meg Hybrid PKS 
Construction of pRMl -based pKOS098-48 for the expression of OlePKS modules 
1-4. 

20 The 240-bp fragment containing the 3 '-end portion of oleAH gene (at nt 

1 1210-1 1452; the first base of the start codon of oleAHxs nt 1) was PGR amplified 
with primers N98-38-1 (5 ' GA AC AACTCCTGTCTGCGGCCGCG-3 ' ) (SEQ ID 
NO: 29) and N98-38-3 (5'- 

CGGAATTCTCTAGAGTCACGTCTCCAACCGCTTGTCGAGG-3') (SEQ ID 
25 NO: 30). The fragment contains a naturally occurring NotI site at its 5'-end and 
the engineered Xbal (bold) and EcoRI sites (underline) at its 3 '-end following the 
oleAH stop codon. pKOS38-189 was digested with EcoRI and NotI to give five 
fragments of 8 kb, 5 kb, 4 kb, 2.5 kb and 2 kb. The 8-kb EcoRJ-NotI fragment 
containing oleAH gene nt 2961 to nt 1 1210 and the 240-bp NotI, EcoRI treated 
30 PCR fragment were ligated into litmus 28 at the EcoRI site via a three-fragment 
ligation to give pKOS98-46. The 8.2-kb EoRI fragment from pKOS98-46 was 
cloned into pKOS38-174, a pRMl derived plasmid containing oleAI and nt 1 to nt 
2960 of oleA II to give pKOS98-48. 
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Construction ofpSET152-hasedpKOS98-60 for the expression of megPKS 
modules 5-<5. 

The 360-bp fragment containing nt 1 to nt 366 of megAIII was PCR 
5 amplified with primers N98-40-3 (5'- 

TCTAGAC TTAATTAA GGAGGACAG4 TA TGAGCG A-G AGCAGC- 
GGCATGACCG-3 5 ) (SEQ ID NO: 31) and N98-40-2 (5'- AACGCCTCCCAG- 
GAGATCTCCAGCA-3 ') (SEQ ID NO: 32). A Pad site and a Ndel site as well 
as the ribosome binding site were introduced at the 5 '-end of the megAI start 

10 codon. The 360-bp Pacl-Bglll fragment was inserted into pKOS 108-06 replacing 
the 22-kb Pacl-Bglll fragment to yield pKOS98-55. The 1 0-kb Pacl-Xbal 
fragment containing megAIII gene and the annealed oligos N98-23-1 (5'- 
AATTCATAGCCTAGGT-3') (SEQ ID NO: 33) and N98-23-2 (5'- 
CTAGACCTAGGCTATG-3 ') (SEQ ID NO: 34) were Ugated to Pad and EcoRI 

1 5 treated pSETl 52 derivative pKOS98-l 4 via a three-fragment ligation to give 
pKOS98-60. 

Example 4 

Conversion of Ervthronolides to Erythromycins 
20 A sample of a polyketide (-50 to 100 mg) is dissolved in 0.6 mL of 

ethanol and diluted to 3 mL with sterile water. This solution is used to overlay a 
three day old culture of Saccharopolyspora erythraea WHM34 (an eryA mutant) 
grown on a 100 mm R2YE agar plate at 30°C. After drying, the plate is incubated 
at 30°C for four days. The agar is chopped and then extracted three times with 100 
25 mL portions of 1 % triethylamine in ethyl acetate. The extracts are combined and 
evaporated. The crude product is purified by preparative HPLC (C-18 reversed 
phase, water-acetonitrile gradient containing 1% acetic acid). Fractions are 
analyzed by mass spectrometry, and those containing pure compound are pooled, 
neutralized with triethylamine, and evaporated to a syrup. The syrup is dissolved 
30 in water and extracted three times with equal volumes of ethyl acetate. The 
organic extracts are combined, washed once with saturated aqueous NaHC0 3 , 
dried overNa 2 S04, filtered, and evaporated to yield -0.15 mg of product. The 
product is a glycosylated and hydroxylated compound corresponding to 
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erythromycin A, B, C, and D but differing therefrom as the compound provided 
differed from 6-dEB. 

Example 5 

5 Measurement of Antibacterial Activity 

Antibacterial activity is determined using either disk diffusion assays with 
Bacillus cereus as the test organism or by measurement of minimum inhibitory 
concentrations (MIC) in liquid culture against sensitive and resistant strains of 
Staphylococcus pneumoniae. 

10 

Example 6 
Evaluation of Antiparasitic Activity 
Compounds can initially screened in vitro using cultures of P. falciparum 
FCR-3 and Kl strains, then in vivo using mice infected with P. berghei. Mammalian 
15 cell toxicity can be determined in FM3A or KB cells. Compounds can also be 

screened for activity against P. berhei. Compounds are also tested in animal studies 
and clinical trials to test the antiparasitic activity broadly (antimalarial, 
trypanosomiasis and Leishmaniasis). 

20 The invention having now been described by way of written description 

and example, those of skill in the art will recognize that the invention can be 
practiced in a variety of embodiments and that the foregoing description and 
examples are for purposes of illustration and not limitation of the following 
claims. 

25 
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Claims 

1 . An isolated nucleic acid comprising a nucleotide sequence 
encoding a domain of megalomicin polyketide synthase (PKS) or a megalomicin 
modification enzyme. 

5 

2. The isolated nucleic acid of claim 1 , which encodes a PKS open 
reading frame (ORF) selected from the group consisting of megAI, megAII and 
megAIII. 

10 3. The isolated nucleic acid of claim 1, wherein the PKS domain is 

selected from the group consisting of a TE domain, a KS domain, an AT domain, 
an ACP domain, a KR domain, a DH domain, and an ER domain. 

4. The isolated nucleic acid of claim 1 , wherein the nucleic acid 

1 5 comprises the coding sequence for a loading module, a thioesterase domain, and 
all six extender modules of megalomicin PKS. 

5. The isolated nucleic acid of claim 1 , which encodes a megalomicin 
modification enzyme that is involved in the conversion of 6-dEB into a 

20 megalomicin. 

6. The isolated nucleic acid of claim 5, which encodes a megalomicin 
modification enzyme that is involved in the biosynthesis of mycarose, 
megosamine or desosamine. 

25 

7. The isolated nucleic acid of claim 1 , wherein the nucleic acid 
codons of homologous regions within the PKS or the megalomicin modification 
enzyme coding sequence have been changed to reduce or abolish the homology 
without changing the amino acid sequences encoded by said changed nucleic acid 

30 codons. 



103 



WO 01/27284 



PCT/USOO/27433 



8. The isolated nucleic acid of claim 1 , which isolated nucleic acid 
fragment hybridizes to a nucleic acid having a nucleotide sequence set forth in the 
SEQ. ID NO:l. 

5 9. A polypeptide, which is encoded by the isolated nucleic acid 

fragment of claim 1 . 

1 0. A recombinant DNA expression vector, comprising the isolated 
nucleic acid of claim 1 operably linked to a promoter. 

10 

11. A recombinant host cell, comprising the recombinant DNA 
expression vector of claim 10. 

1 2. The recombinant host cell of claim 1 1 , which is a Streptomyces or 
1 5 Saccharopolyspora host eel 1 . 

1 3. A recombinant host cell of claim 1 1, which comprises: 

a) at least two separate autonomously replicating recombinant DNA 
expression vectors, each of said vectors comprises a recombinant DNA compound 

20 encoding a megalomicin PKS domain or a megalomicin modification enzyme 
operably linked to a promoter; or 

b) at least one autonomously replicating recombinant DNA expression 
vector and at least one modified chromosome, each of said vector(s) and each of 
said modified chromosome comprises a recombinant DNA compound encoding a 

25 megalomicin PKS domain or a megalomicin modification enzyme operably linked 
to a promoter. 

14. A hybrid PKS that comprises a polypeptide of claim 9 and is 
composed of at least a portion of a megalomicin PKS and at least a portion of a 

30 second PKS for a polyketide other than megalomicin. 
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1 5. The hybrid PKS of claim 14, wherein the second PKS is selected 
from the group consisting of a narbonolide PKS, an oleandoiide PKS, and a DEBS 
PKS. 

5 16. The hybrid PKS of claim 1 5 that is composed of the megAI and 

megAIl gene products and the oleAIII gene product. 

17. The hybrid PKS of claim 16, wherein the KS domain of module 1 
of the megAI gene product has been inactivated by mutation. 

10 

1 8. A method of producing a polyketide, which method comprises 
growing the recombinant host cell of claim 1 1 under conditions whereby the 
megalomicin PKS domain encoded by the recombinant expression vector is 
produced and the polyketide is synthesized by the cell, and recovering the 

1 5 synthesized polyketide. 

19. A recombinant host cell that comprises a recombinant expression 
vector that encodes a megalomicin modification enzyme. 

20 20. The recombinant host cell of claim 1 9 that produces megosamine 

and can attach megosamine to a polyketide, wherein said host cell, in its naturally 
occurring non-recombinant state cannot produce megosamine. 
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Megalooicin AH H (Azithramax) 
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Erythromycin A 



Structures of the Megalomicins and Azithromycin 
Figure 3 
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AT = acyltransferase 
ACP = acyt carrier protein 
KS = ketosynthase 
KR - ketoreductase 
DH = dehydratase 
ER s= enoyl reductase 
TE = thioesterase 
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Biosynthesis of 6-Deoxyerythronolide B (6-dEB), the Aglycone of Erythromycin, by a 

Modular PKS 

Figure 4 
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Erythromycin Biosynthetic Pathway and Megalomicin Biosynthesis 

Figure 5 
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gene 



1 47981 bp DNA 01-MAY-2000 

Megalomicin biosynthetic gene cluster, polyketide synthase, 
desosamine, megosamine, and mycarose biosynthesis genes. 
1 



Micromonospora megalomicea. 
Micromonospora megalomicea 
Unclassified. 

1 (bases 1 to 47981) 

Volchegursky,Y. , Hu,Z., Katz,L. and McDaniel.R. 

Biosynthesis of the Anti-Parasitic Agent Megalomicin: 

Transformation of Erythromycin to Megalomicin in Saccharopolyspora 

erythraea 

Unpublished 

2 (bases 1 to 47981) 
McDaniel,R. and Volchegursky , Y . 
Direct Submission 

Submitted (01-MAY-2000) Kosan Biosciences, Inc., 3828 Bay Center 
Place, Hayward, CA 94545, USA 

Location/Qualifiers 

1. .47981 

/organism="Micromonospora megalomicea" 

/strain= " NRRL327 5" 

/sub_species="nigra" 

complement (<1 . .144) 

/gene="megT" 

complement (<1 . .144) 

/gene="megT " 

/codon_start=l 

/transl_table=ll 

/product* «TDP-4-keto-6-deoxyglucose-2, 3 -dehydratase" 
/trans lation= "MGDRVNGHATPESTQSAIRFLTRHGGPPTATDDVHDWLAHRAAE 

[RLE " (SEQ ID NO: 2) 

>28. .2061 
/genes "megDVI" 
928. .2061 
/gene="megDVI " 
/codon_start=l 
/transl_table=ll 

/product="TDP-4-keto-6-deoxyhexose 3,4-isomerase" 

/translati on = " MAVGDRRRLGRELQMARGLYWG FG ANGDL YSMLL S GRDDDP WTW 

YERLRAAGRGPYASRAGTWWGDHRTAAEVLADPGFTHGPPDAARWMQVAHCPAASWA 

GPFREFYARTEDAASVTVDADWLQQRCARLVTELGSRFDLVNDFAREVPVLALGTAPA 

LKGVDPDRLRS WTS ATRVCLDAQVS PQQLAVTEQALTALDE I DAVTGGRDAAVLVGW 

AELAANTVGNAVLAVTELPELAARLADD PETATR WTE VS RTS PG VHLERRTAAS DRR 

VGGVDVPTGGEVTVWAAANRDPEVFTDPDRFDVDRGGDAEILSSRPGSPRTDLDALV 

ATLATAALRAAAPVLPRLSRSGPVIRRRRSPVARGLSRCPVEL" (SEQ ID NO: 3) 

2072.. 3382 

/gene=* M megDI n 

2072. .3382 

/gene="megDI" 

/codon_start=l 

/transl_table=ll 

/product o"TDP -megosamine glycosyl transferase" 

/ 1 r ans 1 a t ion= " MRWFSSMAVNSHLFGLVPLAS AFQAAGHEVRWASPALTDDVT 

GAGLTAVPVGDDVELVEWHAHAGQDIVEYMRTLDWVDQSHTTMSWDDLLGMQTTFTPT 

FFALMSPDSLIDGMVEFCRSWRPDWIVWEPLTFAAPIAARVTGTPHARMLWGPDVATR 

ARQSFLRLLAHQEVEHREDPLAEWFDWTLRRFGDDPHLSFDEELVLGQWTVDPIPEPL 

RIDTGVRTVGMRYVPYNGPSWPAWLLREPERRRVCLTLGGSSREHGIGQVSIGEMLD 

AI AD I DAE F VATFDDQQL VGVGS VPANVRTAGFVPMNVLL PTCAATVHHGGTGS WLTA 

AIHGVPQI ILSDADTEVHAKQLQDLGAGLSLPVAGMTAEHLRGAIERVLDEPAYRLGA 

ERMRDGMRTDPSPAQWGICQDLAADRAARGRQPRRTAEPHLPR" (SEQ ID NO: 4) 

3462. .4634 
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/gene="megY" 
3462 . .4634 
/gene=" megY" 
/codon_start=l 
/trans l_table=ll 

/product="mycarose O-acyltransf erase" 

/trans lation= "MVTSTNLDTTARPALNSIiTGMRFVAAFLVFFTHVLSRLIPNSYV 

YADGLDAFWQTTGRVGVSFFFILSGFVLTWSARASDSVWSFWRRRVCKLFPNHLVTAF 

AAWLFLVTGQAVSGEALIPNLLLIHAWFPALEISFGINPVSWSLACEAFFYLCFPLF 

LFWISGIRPERLWAWAAWFAAIWAVPWADLLLPSSPPLIPGLEYSAIQDWFLYTFP 

ATRSLEFILGIILARILITGRWINVGLLPAVLLFPVFFVASLFLPGVYAISSSMMILP 

LVLIIASGATADLQQKRTFMRNRV>TVWLGDVSFALYMVHFLVIVYGADLLGFSQTEDA 

PLGLALFMI IPFLAVSLVLSWLLYRFVELPVMRNWARPASARRKPATEPEQTPSRR" 

4651. .5775 (SEQ ID NO: 5) 

/gene=" megDII" 

4651.. 5775 

/gene= "megDII " 

/codon_start=l 

/transl_table=ll 

/product="TDP-3-Jceto-6-deoxyhexose 3 -aminotransaminase" 

/ 1 r ans 1 a t ion= " MTTYVWS YLLEYERERADILDAVQKVFASGS LI LGQS VENFETE 

YARYHGIAHCVGVDNGTNAVKLALESVGVGRDDEVVTVSNTAAPTVLAIDEIGARPVF 

VDVRDEDYLMDTDLVEAAVTPRTKAIVPVHLYGQCVDMTALRELADRRGLKLVEDCAQ 

AHGARRDGRLAGTMSDAAAFSFYPTKVLGAYGDGGAWTNDDETARALRRLRYYGMEE 

VYYVTRTPGHNSRLDEVQAEILRRKLTRLDAYVAGRRAVAQRYVDGLADLQDSHGLEL 

PWTDGNEHVFYVYVVRHPRRDEIIKRLRDGYDISLNISYPWPVHTMTGFAHLGVASG 

SLPVTERLAGE I FSLPMYPSLPHDLQDRVIEAVREVITGL " ( SE Q ID N0: 6 ) 

5822 . .6595 

/gene="megDIII n 

5822. .6595 

/gene*"megDIII" 

/codon_start=l 

/ trans l_table=ll 

/product="daunosaminyl-N,N-dimethyltransf erase" 

/ trans lation="MPNSHSTTSSTDVAPYERADIYHDFYHGRGKGYRAEADAIiVEVA 

RKHTPQAATLLDVACGTGSHLVELADSFREWGVDLSAAMLATAARNDPGRELHQGDM 

RDFSLDRRFDWTCMFSSTGYLVDEAELDRAVANLAGHLAPGGTLWEPWWFPETFRP 

GWVGADLVTSGDRRISRMSHTVPAGLPDRTASRMTIHYTVGSPEAGIEHFTEVHVMTL 

FARAAYEQAFQRAGLSCSYVGHDLFSPGLFVGVAAEPGR" (SEQ ID NO: 7) 

6592.. 7197 

/gene="megDIV" 

6592 . .7197 

/gene="megDIV" 

/codon_start=l 

/trans l_table=ll 

/product="TDP-4-keto-6-deoxyhexose 3 , 5-epimerase" 

/ trans la t ion* " MRVE E LG I EGVFTFt PQTF ADERG VFGTAYQED VF VAALGRPLF 

P VAQVS TTR S RRG WRGVH FTTMPGS MAKYVYC ARGRAMDFAVD I R PGS PTFGRAEP V 

ELSAESMVGLYLPVGMGHLFVSLEDDTTLVYLMSAGYVPDKERAVHPLDPELALPIPA 

DLDLVMS ERDRVAPTLREARDQGI LPD YAACRAAAHRWRT " (SEQ ID NO: 8) 

7220. .8206 

/gene="megDV" 

7220.. 8206 

/gene="megDV" 

/codon_start=l 

/ trans l_table=ll 

/products "TDP-4-keto-6-deoxyhexose 4-ketoreductase" 

/ trans la t ion= " MWLGASGFLGSAVTHALADLPVRVRLVARREVWPSGAVADYE 

THRVDLTE PGALAE WADARAVFP FAAQ IRGTS G WR I SEDD WAERTNVGLVRDL IAV 

LSRSPHAPVWFPGSNTQVGRVTAGRVIDGSEQDHPEGVYDRQKHTGEQLLKEATAAG 

AIRATSLRLPPVFGVPAAGTADDRGWSTMIRRALTGQPLTMWHDGTVRRELLYVTDA 

ARAFVTALDHADALAGRHFIiLGTGRSWPLGEVFQAVSRSVARHTGEDPVPWSVPPPA 

HMDPSDLRSVEVDPARFTAVTGWRATVTMAEAVDRTVAALAPRRAAAPSEPS" 

complement (8228. .9220) (SEQ ID NO: -9) 
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/gene^'megDVH" 
complement (8228 . . 9220) 
/gene="megDVII" 
/codon_s tarts i 
/transT_table=ll 

/products "TDP-4 -keto-6-deoxyhexose 2 , 3 - reductase" 

/ 1 rans la t ion= " MGTTGAGSARVRVGRSALHTSRLWLGTVNFSGRVTDDDALRLMD 

HALERGVNCIDTADIYGWRLYKGHTEELVGRWFAQGGGRREETVLATKVGSEMSERVN 

DGGLS ARH I VAACENS LRRLGVDH ID I YQTHHI DRAAPWDEVWQAAEHLVGSGKVGYV 

GSSNLAGWHIAAAQESAARRNLLGMISHQCLYNLAVRHPELDVLPAAQAYGVGVFAWS 

PLHGGLLSGVLEKLAAGTAVKSAQGRAQVLLPAVRPLVEAYEDYCRRLGADPAEVGLA 

WLSRPGILGAVIGPRTPEQLDSALRAAELTLGEEELRELEAIFPAPAVDGPVP" 

complement (9226. .10479) (SEQ ID NO: 10) 

/gene="megBV" 

complement (9226. .10479) 

/gene="megBV" 

/codon_start=l 

/transl_table=ll 

/products "TDP-mycarose glycosyl trans f erase" 

/translation="MRVLLTSFAHRTHFQGLVPLAWALHTAGHDVRVASQPELTDWV 

GAGLTSVPLGSDHRLFDISPEAAAQVHRYTTDLDFARRGPELRSWEFLHGIEEATSRF 

VFPWNNDSFVDELVEFAMDWRPDLVLWEPFTFAGAVAAKACGAAHARLLWGSDLTGY 

FRSRSQDLRGQRPADDRPDPLGGWLTEVAGRFGLDYSEDLAVGQWSVDQLPESFRLET 

GLESVHTRTLPYNGSSWPQWLRTSDGVRRVCFTGGYSALGITSNPQEFLRTLATLAR 

FDGEIWTRSGLDPASVPDNVRLVDFVPMNILLPGCAAVIHHGGAGSWATALHHGVPQ 

ISVAHEWDCVLRGQRTAEIiGAGVFLRPDEVDADTLWQALATVVEDRSHAENAEKLRQE 

ALAAPT PAE W P VLEAL AHQHRADR " (SEQ ID NO: 11) 

complement (104 83 . .11424) 

/gene="megBIV"- 

complement (10483 . . 11424 ) 

/gene*= "megBIV" 

/codon_start«l 

/transl_tablesii 

/products "TDP-4 -keto- 6 -deoxyhexose 4 -ketoreductase" 

/ 1 rans lat ion= " MTRHVTLLGVSGFVGSALLREFTTHPLRLRAVARTGSRDQPPGS 

AG I EHLRVDLLE PGRVAQ WADTDVVVHLVAYAAGGSTWRS AATVPEAERVNAG I MRD 

LVAALRARPGPAPVLLFASTTQAANPAAPSRYAQHKIEAERILRQATEDGWDGVILR 

L P A I YGHSG PS GQTGRG WTAM I RRALAGEP I TMWHEG S VRRNLLHVEDVATAFTAAL 

HNHEALVGDWTPSADEARPLGEIFETVAASVARQTGNPAVPWSVPPPENAEANDFR 

SDDFDSTEFRTLTGWHPRVPLAEGIDRTVAAIil STKE " (SEQ ID NO: 12) 

12181. .22821 

/gene* "megAI " 

12181. .22821 

/gene="megAI " 

/note="polyketide synthase" 

/codon__startsi 

/transl_tablesll 

/products "megalomic in 6-deoxyerythronolide B synthase 1" 

/ 1 rans la t ions "MVDVPDLLGTRTPHPGPLPFPWPLCGHNEPELRARARQLHAYLE 

GISEDDWAVGAALARETRAQDGPHRAVVVASSVTELTAALAALAQGRPHPSVVRGVA 

RPTAPWFVLPGQGAQWPGMATRLLAESPVFAAAMRACERAFDEVTDWSLTEVLDSPE 

HLRRVEWQPALFAVQTSLAALWRSFGVRPDAVLGHSIGELAAAEVCGAVDVEAAARA 

AALWSREMVPLVGRGDI4AAVAIiSPAELAARVERWDDDVVPAGVNGPRSVLLTGAPEPI 

ARRVAELAAQGVRAQWNVSMAAHSAQVDAVAEGMRSALTWFAPGDSDVPYYAGLTGG 

RLDTRELGADHWPRSFRLPVRFDEATRAVLELQPGTFIESSPHPVLAASLQQTLDEVG 

SPAAIVPTLQRDQGGLRRFLLAVAQAYTGGVTVDWTAAYPGVTPGHLPSAVAVETDEG 

PSTEFDWAAPDHVLRARLLEIVGAETAALAGREVDARATFRELGLDSVLAVQLRTRLA 

TATGRDLHIAMLYDHPTPHALTEALLRGPQEEPGRGEETAHPTEAEPDEPVAWAMAC 

RLPGGVTSPEEFWELLAEGRDAVGGLPTDRGWDLDSLFHPDPTRSGTAHQRAGGFLTG 

ATSFDAAFFGLSPREALAVEPQQRITLELSWEVLERAGIPPTSLRTSRTGVFVGLIPQ 

EYGPRLAEGGEGVEGYLMTGTTTSVASGRVAYTLGLEGPAISVDTACSSSLVAVHLAC 

QSLRRGESTMALAGG\nrVMPTPGMLVDFSRMNSLAPDGRSKAFSAAADGFGMAEGAGM 

LLLERLSDARRHGHPVLAVIRGTAVNSDGASNGLSAPNGRAQVRVIRQALAESGLTPH 

TVDWETHGTGTRLGDPIEARALSDAYGGDREHPLRIGSVKSNIGHTQAAAGVAGIjIK 
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LVLAMQAGVLPRTLHADEPSPEIDWSSGAISLLQEPAAWPAGERPRRAGVSSFGISGT 

NAHAIIEEAPPTGDDTRPDRMGPWPV^SASTGEALRARAARLAGHLREHPDQDLDD 

VAYSLATGRAALAYRSGFVPADASTALRILDELAAGGSGDAVTGTARAPQRWFVFPG 

QGWQWAGMAVDLLDGDPVFASVLRECADALEPYLDFEIVPFLRAEAQRRTPDHTLSTD 

RVD WQP VL FAVMVS LAARWRAYG VE P A^VI GH S QGE I AAAC VAGALS LDDAARAVAL 

RSRVIATMPGNGAMASIAASVDEVAARIDGRVEIAAVNGPRAWVSGDRDDLDRLVAS 

CTVEGVRAKRLPVDYASHSSHVEAVRDALHAELGEFRPLPGFVPFYSTVTGRWVEPAE 

LDAGYWFRNLRHRVRFADAVRSLADQGYTTFLEVSAHPVLTTAIEEIGEDRGGDLVAV 

HSLRRGAGGPVDFGSALARAFVAGVAVDWESAYQGAGARRVPLPTYPFQRERFWLEPN 

PARRVADSDDVSSLRYRIEWHPTDPGEPGRLDGTWLLATYPGRADDRVEAARQALESA 

GARVEDIiWEPRTGRVDLVRRLDAVGPVAGVLCLFAVAEPAAEHSPLAVTSLSDTLDL 

TQAVAG SGREC P I WWTENAVAVGP FE RLRD PAHGALW ALGR VVALEN P AVWGGLVDV 

PSGSVAELSRHLGTTLSGAGEDQVALRPDGTYARRWCRAGAGGTGRWQPRGTVLVTGG 

TGGVGRHVARWLARQGTPCLVLASRRGPDADGVEELLTELADLGTRATVTACDVTDRE • 

QLRALLATVDDEHPLSAVFHVAATLDDGTVETLTGDRIERANRAKVLGARNLHELTRD 

ADLDAFVLFSSSTAAFGAPGLGGYVPGNAYLDGLAQQRRSEGLPATSVAWGTWAGSGM 

AEGPVADRFRRHGVMEMHPDQAVEGLRVALVQGEVAPIWDIRWDRFLLAYTAQRPTR 

LFDTLDEARRAAPGPDAGPGVAALAGLPVGEREKAVLDLVRTHAAAVLGHASAEQVPV 

DRAFAELGVDSLSALELRNRLTTATGVRLATTTVFDHPDVRTLAGHLAAELGGGSGRE 

RPGGEAPTVAPTDEPIAIVGMACRLPGGVDSPEQLWELIVSGRDTASAAPGDRSWDPA 

ELMVSDTTGTRTAFGNFMPGAGEFDA^FFGISPREALAMDPQQRHALETTWEALENAG 

IRPESLRGTDTGVFVGMSHQGYATGRPKPEDEVDGYLLTGNTASVASGRIAYVLGLEG 

PAITVDTACS S SLVALHVAAGS LRSGDCGLAVAGGVS VMAGPEVFRE FS RQGALAPDG 

RCKPFSDEADGFGLGEGSAFWLQRLSVAVREGRRVLG\AA/GSAVNQDGASNGLAAPS 

GVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGPW 

VGSVKANVGHVQAAAGWGVIKNAHjGLGRGLVGPMVCRGGLSGLVDWSSGGLVVADGV 

RGWPVGVDGVRRGGVSAFGVSGTNAHVWAEAPGSWGAERPVEGSSRGLVGWGGW 

PWLSAKTETALHAQARRLADHLETHPDVPMTDWWTLTQARQRFDRRAVLLAADRTQ 

AVERLRGLAGGEPGTGWSGVASGGGWFVFPGQGGQWGMARGLLSVPVFVESWEC 

DAWSSWGFSVLGVLEGRSGAPSLDRVDWQPVLFWMVSLARLWRWCGWPAAWG 

HSQGEIAAAWAGVLSVGDGARWALRARALRALAGHGGMASVRRGRDDVQKLIiDSGP 

WTGKLEIAAVNGPDAWVSGDPRAVTELVEHCDGIGVRARTIPVDYASHSAQVESLRE 

ELLSVLAGIEGRPATVPFYSTIiTGGFVDGTELDADYWYRNLRHPVRFHAAVEALAARD 

LTTFVEVSPHPVLSMAVGETLADVESAVTVGTLERDTDDVERFLTSLAEAHVHGVPVD 

WAAVLGSGTLVDLPTYPFQGRRFWLHPDRGPRDDVADWFHRVDWTATATDGSARLDGR 

WLVWPEGYTDDGWWEVRAALAAGGAEPWTTVEEVTDRVGDSDAWSMLGLADDGA 

AETLALLRRLD AQAS TT PLWWTVG AVAPAG P VQRP EQATVWGLALVAS LE RGHRWTG 

LLDLPQTPDPQLRPRLVEALAGAEDQVAVRADAVHARRIVPTPVTGAGPYTAPGGTIL 

VTGGTAGLGAVTARWLAERGAEHLALVSRRGPGTAGVDEWRDLTGLGVRVSVHSCDV 

GDRESVGALVQELTAAGDVVRGVVHAAGLPQQVPLTDMDPADLADWAVKVDGAVHIiA 

DLCPEAELFLLFSSGAGVWGSARQGAYAAGNAFLDAFARHRRDRGLPATSVAWGLWAA 

GGMTGDQEAVSFLRERGVRPMSVPRALEALERVLTAGETAVWADVDWAAFAESYTSA 

RPRPLLHRLVTPAAAVGERDEPREQTLRDRLAALPRAERSAELVRLVRRDAAAVLGSD 

AKAVPATTPFKDLGFDSLAAVRFRNRLAAHTGLRLPATLVFEHPNAAAVADLLHDRLG 

EAGEPTPVRSVGAGLAALEQALPDASDTERVELVERLERMLAGLRPEAGAGADAPTAG 

DDLGEAGVDELLDALERELDAR" (SEQ ID NO: 13) 

misc_feature 12505.. 13470 
/gene="megAI " 
/function= f, AT-L" 

misc_f eature 13576 . . 13791 
/gene="megAI" 
/function* "ACP-L" 

misc_feature 13849.. 15126 
/gene="megAI" 
/function* "KS1" 

misc_f eature 15427 . . 16476 
/gene="megAI " 
/function="ATl" 

misc_f eature 17155. .17694 
/gene="megAl" 
/function="KRl» 

misc_f eature 17947 . . 18207 
/gene*" meg AI M 
/function* "ACPI" 
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18268. .19548 

/ gene =" meg AI" 

/function="KS2" 

19876. .20910 

/gene^'megAI " 

/function="AT2» 

21517. .22053 

/gene="megAI " 

/function* "KR2 » 

22318. .22575 

/gene="megAI " 

/function="ACP2 » 

22867. .33555 

/gene="megAII M 

22867. .33555 

/gene= ,, megAli n 

/note="polyketide synthase" 

/codon_start=l 

/ trans l_table= 11 

/producto"megalomicin 6-deoxyerythronolide B synthase 2" 

/ 1 r ans la t ion= ° MTDNDKVAEYLRRATLDLRAARKRLRELQSDPI AWGMACRLPG 

GVHLPQHLWDLLRQGHETVSTFPTGRGWDLAGLFHPDPDHPGTSYVDRGGFLDDVAGF 

DAEFFGISPREATAMDPQQRLLLETSWELVESAGIDPHSLRGTPTGVFLGVARLGYGE 

NGTEAGDAEGYS VTGVAPAVASGRI S YALGLEGPS I SVDTACS S SLVALH LAVES LRL 

GESSIAWGGAAVMATPGVFVDFSRQRALAADGRSKAFGAAADGFGFSEGVSLVLLER 

LSEAESNGHEVLAVIRGSALNQDGASNGLAAPNGTAQRKVIRQALRNCGLTPADVDAV 

EAHGTGTTLGDPIEANALLDTYGRDRDPDHPLWLGSVKSNIGHTQAAAGVTGLLKMVL 

ALRHEELPATLHVDEPTPHVDWSSGAVRLATRGRPWRRGDRPRRAGVSAFGISGTNAH 

VIVEEAPERTTERTVGGDVGPVPLWSARSAAALRAQAAQVAELVEGSDVGLAEVGRS 

LAVTRARHEHRAAWAS TRAE AVRGLREVAAVE PRGEDT VTGVAETSGRTWFLF PGQ 

GSQWVGMGAELLDSAPAFADTIRACDEAMAPLQDWSVSDVLRQEPGAPGLDRVDWQP 

VL F AVMVS LARL WQS YGVTP AA WGHS QGE I AAAHVAG AL S L AD AARL WGRS RLLRS 

LSGGGGMSAVALGEAEVRRRLRSWEDRISVAAVNGPRSWVAGEPEALREWGREREAE 

GVRVREIDVDYASHSPQIDRVRDELLTVTGEIEPRSAEITFYSTVDVRAVDGTDLDAG 

YW YRNLRETVRFAD AMTRLADS G YD AF VE VS PH P VWS AVAE AVE E AGVEDAWVGTL 

SRGDGGPGAFLRSAATAHCAGVDVDWTPALPGAATIPLPTYPFQRKPYWLRSSAPAPA 

SHDLAYRVSWTPITPPGDGVLDGDWLWHPGGSTGWVDGLAAAITAGGGRWAHPVDS 

VTSRTGLAEALARRDGTFRGVLSWATDERHVEAGAVALLTLAQALGDAGIDAPLWCL 

TQE AVRT P VDGDLAR P AQ AALHGFAQVARLELARRFGG VLDL P ATVDAAGTRLVAAVL 

AGGGEDWAVRGDRLYGRRLVRATLPPPGGGFTPHGTVLVTGAAGPVGGRLARWLAER 

GATRL VL PG AHPG E ELLTA I RAAGATAWCE PE AEALRTAIGGE LPTAL VHAETLTNF 

AGVADADPEDFAATVAAKTALPTVLAEVLGDHRLEREVYCSSVAGVWGGVGMAAYAAG 

SAYLDALVEHRRARGHASASVAWTPWALPGAVDDGRLRERGLRSLDVADALGTWERLL 

RAGAVSVAVADVDWSVFTEGFAAIRPTPLFDELLDRRGDPDGAPVDRPGEPAGEWGRR 

IAALSPQEQRETLLTLVGETVAEVLGHETGTEINTRRAFSELGLDSLGSMALRQRLAA 

RTGLRMPASLVFDHPTVTALARYLRRLWGDSDPTPVRVFGPTDEAEPVAWGIGCRF 

PGGIATPEDLWRWSEGTSITTGFPTDRGWDLRRLYHPDPDHPGTSYVDRGGFLDGAP 

DFDPGFFGITPREALAMDPQQRLTLEIAWEAVERAGIDPETLLGSDTGVFVGMNGQSY 

LQLLTGEGDRLNGYQGLGNSASVLSGRVAYTFGWEGPALTVDTACSSSLVAIHLAMQS 

LRRGECSLAIJVGGVTVMADPYTFVDFSAQRGLAADGRCKAFSAQADGFALAEGVAALV 

LE PL S KARRNGHQ VLAVLRGS AVNQDG ASNGLAAPNG PSQERVI RQALTAS GLRPADV 

DMVEAHGTGTELGDPIEAGALIAAYGRDRDRPLWLGSVKTNIGHTQAAAGAAGVIKAV 

LAMRHGVLPRS LHADELS PH IDW ADGKVEVLRE ARQWPPGERPRRAGVS S FGVSGTNA 

HVIVEEAPAEPDPEPVPAAPGGPLPFVLHGRSVQTVRSQARTLAEHLRTTGHRDLADT 

ARTLATGRARFDVRAAVLGTDREGVCAALDALAQDRPSPDWAPAVFAARTPVLVFPG 

QGSQWVGMARDLLDSSEVFAESMGRCAEALSPYTDWDLLDWRGVGDPDPYDRVDVLQ 

PVLFAVMVSLARLWQS YGVTPGAWGHSQGE I AAAHVAGALS LADAARWALRSRVLR 

ELDDQGGMVS VGTS RAELD S VLRRWDGRVAVAAVNGPGTL WAG PTAELDE FLAVAEA 

REMRPRRIAVRYASHSPEVARVEQRLAAELGTVTAVGGTVPLYSTATGDLLDTTAMDA 

GYWYRNLRQPVLFEHAVRSLLERGFETFIEVSPHPVLLMAVEETAEDAERPVTGVPTL 

RRDHDGPSEFLRNLLGAHVHGVDVDLRPAVAHGRLVDLPTYPFDRQRLWPKPHRRADT 

SSLGVRDSTHPLLHAAVDVPGHGGAVFTGRLSPDEQQWLTQHWGGRNLVPGSVLVDL 

ALTAGADVGVPVLEELVLQQPLVLTAAGALLRLSVGAADEDGRRPVEIHAAEDVSDPA 

EARWSAYATGTLAVGVAGGGRDGTQWPPPGATALTLTDHYDTLAELGYEYGPAFQALR 
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AAWQHGDWYAEVSLDAVEEGYAFDPVLLDAVAQTFGLTSRAPGKLPFAWRGVTLHAT 
GATAVRWATPAGPDAVALRVTDPTGQLVATVDALWRDAGADRDQPRGRDGDLHRLE 
WVRLATPDPTPAAVVHVAADGLDDLLRAGGPAPQAVVVRYRPDGDDPTAEARHGVLWA 
ATLVRRWLDDDRWPATTLWATSAGVEVSPGDDVPRPGAAAVWGVLRCAQAESPDRFV 
LVDGDPETPPAVPDNPQLAVRDGAVFVPRLTPLAGPVPAVADRAYRLVPGNGGSIEAV 
AFAPVPDADRPLAPEEVRVAVRATGVNFRDVLLALGMYPEPAEMGTEASGWTEVGSG 
VRRFTPGQAVTGLFQGAFGPVAVADHRLLTPVPDGWRAVDAAAVPIAFTTAHYALHDL 
AGLQAGQS VLVHAAAGGVGMAAVALARRAGAJE VFATAS PAKHPTLRALGLDDDHI AS S 
RESGFGERFAARTGGRGVDWLNSLTGDLLDESARLLADGGVFVEMGKTDLRPAEQFR 
GRYVPFDLAEAGPDRLGEILEEWGLLAAGALDRLPVSVWELSAAPAALTHMSRGRHV 
GKLVLTQPAPVHPDGTVLVTGGTGTLGRLVARHLVTGHGVPHLLVASRRGPAAPGAAE 
LRADVBGLGAT IE I VACDTADREALAALLDS I PADR PLTGWHTAGVLADGLVTS I DG 
TATDQVLRAKVDAAWHLHDLTRDADLS FFVLFS SAAS VTiAGPGQGVYAAANGVLNALA 
GQRRALGLPAKALGWGLWAQASEMTSGLGDRIARTGVAAIjPTERALALFDAALRSGGE 
VLFPLSVDRSALRRAEYVPEVLRGAVRSTPRAANRAETPGRGLLDRLVGAPETDQVAA 
LAELVRSHAAAVAGYDSADQLPERKAFKDLGFDSLAAVELRNRLGVTTGVRLPSTtiVF 
DHPTPIiAVAEHLRSELFADSAPDVGVGARLDDLERALDAIiPDAQGHADVGARLEALIjR 
RWQSRRPPETEPVTISDDASDDELFSMLDRRLGGGGDV" (SEQ ID NO: 14) 
misc_f eature 22957 . .24237 

/ gene s" megAI I " 
/functions "KS3" 
misc_feature 24544 . .25581 

/gene= w megAII" 
/functions"AT3" 
misc_f eature 26230. .26733 

/genes "megAII" 
/function="KR3 (inactive)" 
misc_f eature 26998. .27258 

/genes "megAI I" 
/functions"ACP3" 
misc_f eature 27393. .28590 

/genes "megAII" 
/function^ "KS4" 
misc_f eature 28897. .29931 

/ gene = n megAI I " 
/ functions "AT4 " 
misc_f eature 2 9953 . .30477 

/genes "megAII" 
/ functions "DH4" 
misc_f eature 31396. .32244 

/genes "megAII" 
/function="ER4 " 
misc_f eature 32257 . . 32799 

/gene= "megAII" 
/functions»KR4" 
misc_feature 33052.. 33312 

/gene= "megAII" 
/functions»ACP4" 
gene 33666.. 43271 

/gene « " megAI II" 
CDS 33666.. 43271 

/genes" megAI II" 
/note="polyketide synthase" 
/codon_s tartsi 
/transl_tablesil 

/products "megalomicin 6-deoxyerythronolide B synthase 3" 
/ 1 rans 1 a t ions "MSESSGMTEDRLRR YLKRTVAEIiDSVTGRLDEVE YRAREPIAW 
GMACRFPGGVDSPEAFWEFIRDGGDAIAEAPTDRGWPPAPRPRLGGLaAEPGAFDAAF. 
FGI S PREALATDPQQRLMLE I S WE ALERAGFDP S S LRGSAGGVFTGVGAVDYGPRPDE 
APE E VXG YVG IGTAS S VASGRVAYTLGLEG PAVTVDTACS SGLTAVHLAMESLRRDEC 
TLvTAGGVTVMSSPGAFTEFRSQGGLAEDGRCKPFSRAADGFGLAEGAGVLVLQRLSV 
ARAEGRPV1AVLRGSAINQDGASNGLTAPSGPAQRRVIRQALERARLRPVDVDYVEAH 
GTGTRLGDPIEAHALLDTYGADREPGRPLWVGSVXSNIGHTQAAAGVAGvTlKTVlAIiR 
HREIPATLHFDEPSPHVDWDRGAVSWSETRPWPVGERPRRAGVSSFGISGXNAHVIV 
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EEAPSPQAADLDPTPGPATGATPGTDAAPTAEPGAEAVALVFSARDERALRAQAARLA 

DRLTDDPAPSLRDTAFTLVTRRATWEHRAVWGGGEEVLAGLRAVAGGRPVDGAVSGR 

ARAGRRWLVFPGQGAQWQGMARDLLRQSPTFAESIDACERALAPHVDWSIjREVLDGE 

QSLDPVDWQPVLFAVt-TVSLARLWQSYGVTPGAWGHSQGEIAAAHVAGALSLADAAR 

WALRSRVLRRLGGHGGMZISFGLHPDQAAERIARFAGALTVASVNGPRSWLAGENGP 

LDELI AECE AEGVTARRI PVDYASHS PQVES LREELLAALAGVRPVS AG I PLYSTLTG 

QVIETATMDADYWFANLREPVRFQDATRQLAEAGFDAFVEVSPHPVLTVGVEATLEAV 

LPPDADPCVTGTLRRERGGLAQFHTALAEAYTRGVEVDWRTAVGEGRPVDLPVYPFQR 

QNFWLPVPLGRVPDTGDEWRYQLAWHPVDLGRSSLAGRVLWTGAAVPPAWTDWRIX3 

LEQRGATWLCTAQSRARIGAALDAVDGTALSTWSLLALAEGGAVDDPSLDTLALVQ 

ALGAAGIDVPLWIiVTRDAAAVTVGDDVDPAQAMVGGLGRWGVESPARWGGLVDLREA 

DADSARSLAAILADPRGEEQFAIRPDGVTVARLVPAPARAAGTRWTPRGTVLVTGGTG 

G IGAHLARWLAGAG AEHLVLLNRRGAE AAG AADLRDELVALGTGVT I TACDVADRDRL 

AAVLDAARAQGRVWAVFHAAGISRSTAVQELTESEFTEITDAKVRGTANLAELCPEL 

DALVLFSSNAAVWGSPGLASYAAGNAFLDAFARRGRRSGLPVTSIAWGLWAGQNMAGT 

EGGDYLRSQGLRAMDPQRAIEELRTTLDAGDPWVSWDLDRERFVELFTAARRRPLFD 

ELGGVRAGAEETGQESDLARRLASMPEAERHEHVARLVRAEVAAVLGHGTPTVIERDV 

AFRDLGFDSMTAVDLRNRLAAVTGVRVATTIVFDHPTVDRLTAHYLERLVGEPEATTP 

AAAWPQAPGEADEPIAIVGMACRLAGGVRTPDQLWDFIVADGDAVTEMPSDRSWDLD 

ALFDPDPERHGTSYSRHGAFLDGAADFDAAFFGISPREALAMDPQQRQVLETTWELFE 

NAG ID PHSLRGTDTGVFLGAAYQGYGQN AQVP KE S EG YLLTGGS S AVAS GR I AYVLGL 

EGPAITVDTACSSSLVALHVAAGSLRSGDCGLAVAGGVSVMAGPEVFTEFSRQGALAP 

DGRCKPFSDQADGFGFAEGVAWLLQRLSVAVREGRRVLGVWGSAVNQDGASNGLAA 

PSGVAQQRVIRRAWGRAGVSGGDVGWEAHGTGTRLGDPVELGALLGTYGVGRGGVGP 

VWGSVKANVGHVQAAAGWGVIKWLGLGRGLVGPMVCRGGLSGLVDWSSGGLWAD 

GVRGWPVGVDGVRRGGVSAFGVSGTNAHWVAEAPGSWGAERPVEGSSRGLVGVAGG 

WPVVLSAKTETALTELARRLHDAVDDTVALPAVAATLATGRAHLPYRAALLARDHDE 

LRDRLRAFTTGSAAPGWSGVASGGGWFVFPGQGGQWVGMARGLLSVPVFVESWEC 

DAWSSVVGFSVLGVLEGRSGAPSLDRVDVVQPVLFVVMVSLARLWRWCGVVPAAVVG 

HSQGEIAAAWAGVLSVGDGARWALRARALRALAGHGGMVSLAVSAERARELIAPWS 

DRISVAAVNSPTSVWSGDPQALAALVAHCAETGERAKTLPVDYASHSAHVEQIRDTI 

LTDLADVTARRPDVALYSTLHGARGAGTDMDARYWYDNLRSPVRFDEAVEAAVADGYR 

VFVEMS PHPVLTAAVQE IDDETVAIGS LHRDTGERHLVAELARAHVHGVPVDWRAILP 

ATHPVPLPNYPFEATRYWLAPTAADQVADHRYRVDWRPLATTPAELSGSYLVFGDAPE 

TLGHSVEKAGGLLVPVAAPDRESLAVALDEAAGRLAGVLSFAADTATHLARHRLLGEA 

DVEAPLWLVTSGGVALDDHDPIDCDQAMVWGIGRVMGLETPHRWGGLVDVTVEPTAED 

GWFAALLAADDHEDQVALRDGIRHGRRLVRAPLTTRNARWTPAGTALVTGGTGALGG 

HVARYLARSGVTDLVLLSRSGPDAPGAAELAAELADLGAEPRVEACDVTDGPRLRALV 

QELREQDRPVRIWHTAGVPDSRPLDRIDELESVSAAKVTGARLLDELCPDADTFVLF 

SSGAGVWGSANLGAYAAANAYLDALAHRRRQAGRAATSVAWGAWAGDGMATGDLDGLT A 

RRGLRAMAPDRALRACTRRWTTHDTCVSVADVDWDRFAVGFTAARPRPLIDELVTSAP A 

VAAPTAAAAPVPAMTADQLLQFTRSHVAAILGHQDPDAVGLDQPFTELGFDSLTAVGIi 

RNQLQQATGRTLPAALVFQHPTVRRLADHLAQQLDVGTAPVEATGSVLRDGYRRAGQT 

GDVRSYLDLLANLSEFRERFTDAASLGGQLELVDLADGSGPVTVICCAGTAALSGPHE 

FARLASALRGTVPVRALAQPGYEAGEPVPASMEAVLGVQADAVLAAQGDTPFVLVGHS . 

AGALMAYALATELADRGHPPRGV^LDVYPPGHQEAVHAWLGELTAALFDHETVRMD 

TRLTALGAYDRIjTGRWRPRDTGLPTLWAASEPMGEWPDDGWQSTWPFGHDRVTVPGD 

HFSMVQEHADAI ARHIDAWLSGERA " (SEQ ID NO: 15) 

misc_f eature 33780 . . 35027 

/genes"megAIII tt 
/function- "KS5" 

misc_f eature 35385. .36419 

/gene="megAIII" 
/functions" ATS " 

misc_f eature 37068. .37604 

/gene="megAIII" 
/function= 11 KR5" 

misc_f eature 37860. .38120 

/gene^'megAHI" 
/functions-'ACPS" 

misc_f eature 38187.. 39470 

/gene= "megAI I I " 
/ function^ "KS6" 

misc feature 39795. . 40811 
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misc_feature 

misc_f eature 

misc_£eature 

gene 
CDS 



gene 
CDS 



gene 
CDS 



gene 
CDS 



/gene= n megAIII" 
/functions" ATS" 
41406. .41936 
/gene="megAIII" 
/function="KR6" 
42168. .42425 
/gene="megAIII" 
/ function** "ACP6" 
42585. .43271 
/gene="megAIII" 
/function="TE" 
43268. .44344 
/gene=" megCII" 
43268 . .44344 
/gene-" megCII" 
/ codon_s tar t«l 
/transl_table«=n 

/product«"TDP-4-keto-6-deoxyglucose 3 , 4-isomerase" 

/trans lation="MNTTDRAVLGRRLQMIRGLYWGYGSNGDPYPMLLCGHDDDPHRW 

YRGLGGSGVRRSRTETWVVTDHATAVRVLDDPTFTRATGRTPEWMRAAGAPASTWAQP 

FRDVHAASWDAELPDPQEVEDRLTGLLPAPGTRLDLVRDLAWPMASRGVGADDPDVLR 

AAWDARVGLDAQLTPQPLAVTEAAIAAVPGDPHRRALFTAVEMTATAFVDAVLAVTAT 

AGAAQRLADDPDVAARLVAEVLRLHPTAHLERRTAGTETVVGEHTVAAGDEVVVVVAA 

ANRDAGVFADPDRLDPDRADADRALSAQRGHPGRLEELVWLTTAALRSVAKALPGLT 

AGG P WRRRRS PVLRATAH C P VEL " (SEQ ID NO: 16) 

44355 . .45623 

/gene = " megCI 1 1 " 

44355. .45623 

/gene= !, megCIII" 

/codon_start=l 

/transl_table=ll 

/product="TDP-desosamine glycosyltransf erase" . 

/translation^" MRWFS SMAS KSHLFGLVPLAWAFRAAGHE VRWAS PALTDD IT 

AAGLTAVPVGTDVDLVDFMTHAGYDIIDYVRSLDFSERDPATSTWDHLLGMQTVLTPT 

FYALMS PDS LVEGM I S FCRS WR PD WS SGPQTFAAS I AATVTG VAHARL LWGPD ITVRA 

RQKFLGLLPGQPAAHREDPLAEWLTWSVERFGGRVPQDVEELWGQWTIDPAPVGMRL 

DTGLRTVGMRYVDYNGPSWPDWLHDEPTRRRVCLTLGISSRENSIGQVSVDDLLGAL 

GDVDAE 1 1 ATVDEQQLEGVAHVPANIRTVGFVPMHALLPTCAATVHHGGPGSWHTAAI 

HGVPQVILPDGWDTGVRAQRTEDQGAGIALPVPELTSDQLREAVRRVLDDPAFTAGAA 

RMRADMLAE P S PAEWDVC AGLVGERTAVG 11 (SEQ ID NO: 17) 

45620. .46591 

/gene="megBII" 

45620 . .46591 

/gene="megBII" 

/codon_start=l 

/ 1 rans l_t abl e= 1 1 

/product="TDP-4-keto-6-deoxyglucose 2,3 dehydratase" 

/ 1 ran S 1 at ion= 11 M STD ATHVRLGRCALLTS RLWLGTAALAGQDDADAVRLLDHARS 

RGVNCLDTADDDSASTSAQVAEESVGRWLAGDTGRREETVLSVTVGVPPGGQVGGGGL 

SARQIIASCEGSLRRLGVDHVDVLHLPRVDRVEPWDEVWQAVDALVAAGKVCYVGSSG 

FPGWHIVAAQEHAVRRHRLGLVSHQCRYDLTSRHPELEVLPAAQAYGLGVFARPTRLG 

GLLGGDGPGAAAARASGQPTALRSAVEAYEVFCRDLGEHPAEVALAWVLSRPGVAGAV 

VGARTPGRLDSALRACGVALGATELTALDGIFPGVAAAGAAPEAWLR" (SEQ ID NO: 18) 

complement (46660. .47403) 

/gene="megH" 

complement (46660. .47403) 

/gene="megH" 

/note="putative thioesterase" 
/codon_start=l 
/transl_table=ll 
/product^ "TEII" 

/ 1 r ans 1 a t ion= " MNTWLRRFGS ADGHRARLYCFPHAGAAADS YLDLARALAPEVDV 
WAVQYPGRQDRRDERALGTAGEIADEVAAVLRDLVGEVPFALFGHSMGALVAYETARR 
LEARPGVRPLRLFVSGQTAPRVHERRTDLPDEDGLVEQMRRLGVSEAALADQGLLDMS 
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LPVLRADHRVLRSYAWQAGPPLRAGITTLCGDTDPLTTVEDAQRWLPYSWPGRTRTF 

PGGHFYLADHVGEVAESVAPDLLRLTPTG" (SEQ ID NO: 19) 
gene complement (47411 . .>47981) 

/gene="megF" 
CDS complement (47411. . >47980) 

/gene^'megF*' 

/codon_start=l 

/translatable =11 

/product^ "C- 6 hydroxylase" 

/translations " IRVQDDDADRLSRDELTSIALVLLLAGFEASVSLIGIGTYLLLT 
HPDQLALVRKDPALLPGAVEEILRYQAPPETTTRFATAEVEIGGVTIPAYSTVLIANG 
AANRDPGQFPDPDRFDVTRDSRGHLTFGHGIHYCMGRPLAKLEGEVALGALFDRFPKL 
SLGFPSDEVWRRSLLLRGIDHLPVRPNG" (SEQ ID NO: 20) 

BASE COUNT 5962 a 16875 c 18045 g 7099 t 

ORIGIN 

1 ctcgagccga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 
61 gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 
121 atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 
181 taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 
241 cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 
301 gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 
361 tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 
421 gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 
481 ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 
541 catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 
601 cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 
661 ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 
721 gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 
781 ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 
841 ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 
901 tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 
961 gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 
1021 atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 
1081 ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 
1141 gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 
1201 caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 
1261 accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg 
1321 ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 
1381 gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 
1441 tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 
1501 accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 
1561 gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 
1621 gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 
1681 cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 
1741 gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 
1801 gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 
1861 cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 
1921 gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 
1981 ctgtcccgtt ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 
2041 cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 
2101 caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 
2161 ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 
2221 cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat- 
2281 cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 
2341 cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 
2401 ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 
2461 ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 
2521 gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 
2581 ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 
2641 gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 
2701 cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 
2761 gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 
2821 tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 
2881 catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 
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2941 cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 
3001 gccgatgaac gtcctgctgc ccacctgcgc ggccaccgtg caccacggcg gcaccggcag 
3061 ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 
3121 ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 
3181 gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 
3241 cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 
3301 catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 
3361 cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 
3421 aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 
3481 tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 
3541 tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaaeagctac gtgtacgccg 
3601 acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 
3 661 tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggfcgtgg tcgttctggc 
3721 gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 
3781 tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 
3841 tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 
3901 tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 

3 961 tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 
4021 cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 
4081 ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 
4141 tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 
4201 ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 
4261 ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 
4321 ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 
4381 tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 
4441 ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 
4501 tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 
4561 gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 
4621 cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 
4681 gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 
4741 agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 
4801 atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 
4861 gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 

4 921 ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 
4981 ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 
5041 gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 
5101 ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 
5161 gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 
5221 tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 
5281 ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 
5341 cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 
5401 gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 
5461 tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 
5521 gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 
5581 tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 
5641 gtcgcgtcgg ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 
5701 atgtacccct ccctccctca cgacctgcag gacagggtga tcgaggcggt gcgggaggtc 
5761 atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 
5821 catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 
5881 catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 
5941 cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 
6001 gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 
6061 gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 
6121 cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 
6181 caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 
6241 cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 
6301 cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 
6361 caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 
6421 ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 
64 81 ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 
6541 cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 
6601 gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 
6661 ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 
6721 ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac 
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6781 ttcacgacga tgcccggctc catggcgaag 
6841 gacttcgccg tcgacatccg gcccggttcc 
6901 ctctccgccg agtcgatggt cgggctgtac 
6961 tccctggagg acgacaccac cctcgtctac 
7021 gaacgggcgg tgcaccccct ggatccggag 
7081 ctcgtcatgt ccgagcggga ccgggtcgca 
7141 atcctgcccg actacgccgc ctgccgggcc 
7201 cggccgggcg tgcgggccgg tggtggtgct 
7261 cacccacgcc ctggccgacc tcccggtgcg 
7321 cgtgccctcc ggtgccgtcg ccgactacga 
7381 agcgctcgcg gaggtggtcg cggacgcccg 
7441 gggtacgtca gggtggcgga tcagcgagga 
7501 cctggtccgg gacctgatcg ccgtcctgtc 
7561 cccgggcagc aacacgcagg tcggcagggt 
7621 gcaggaccac cccgagggcg tctacgacag 
7681 ggaggccact gcggccgggg cgatccgggc 
7741 ggtgcccgcc gccggcaccg ccgacgaccg 
78 01 cctgaccggc caaccgctga cgatgtggca 
7861 cgtgaccgac gccgcccggg ccttcgtcac 
7921 acgccacttc ctgttgggga cggggcgttc 
7981 ctcgcgcagc gtcgcccggc acaccggcga 
8041 tccggcgcac atggacccgt cggacctgcg 
8101 ggctgtcacc gggtggcggg ccacggtcac 
8161 ggcgttggcc ccccgccggg ccgccgcccc 
8221 gttcgtccta cggcaccggc ccgtcgacgg 
8281 ggagttcctc ctcgcccagc gtcagctcgg 
8341 gtgtgcgggg gccgatgaca gcgcccagga 
8401 cgacctcggc cgggtccgcg ccgaggcgtc 
8461 ggcgtacggc ggggaggagc acctgggcgc 
8521 ctgccaactt ctccagtacg ccgctgagca 
8581 cgcccacccc gtacgcctgg gcggcgggca 
8641 ggttgtacag gcactggtgg gagatcatgc 
8701 gggcggcggc gatgtgccag cccgccaggt 
8761 tgccgaccag atgttcggcg gcctgccaca 
8821 ggtgcgtctg gtagatgtcg atgtggtcga 
8881 cggcgacgat gtgtcgggcg gagagcccgc 
8941 ccaccttggt cgccaggacg gtctcctcgc 
9001 cgacgagttc ctcggtgtgg cccttgtaga 
9061 tgcagttgac gccccgctcg agggcgtggt 
9121 cccgtccact gaagttcacg gtgccgagcc 
9181 cgacgcgtac ccgggcggac ccggccccgg 
9241 gtgctggtgg gcgagcgcct ccagcacggg 
9301 cgcctcctgc cgcagcttct cggcgttctc 
9361 gagagcctgc cagagggtgt cggcgtcgac 
9421 cagctcggcg gtgcgctgac cacgcaggac 
94 81 cggtacgccg tggtgcagcg cggtggccca 
9541 ggcacagccc ggcagcagga tgttcatggg 
9601 caccgacgcc ggatcgagcc cggagcgggt 
9661 ggtggccagt gtccggagga actcctgcgg 
9721 cccggtgaag cagacccggc ggactccgtc 
9781 ggacccgttg tagggcaaag tccgggtgtg 
9841 gctctcgggc agctggtcga cgctccactg 
9901 gccgaaccgg ccggcgacct cggtgagcca 
9961 gggacgctgc ccgcgcaggt cctgggagcg 
10021 ccacagcagc cgggcgtggg cggccccgca 
10081 gaagggctcc cagagcacca ggtcgggacg 
10141 gacgaaggag tcgttgttga ccaccgggaa 
10201 gtgcaggaac tcccacgagc gcagttccgg 
10261 gtagcggtgc acctgcgcgg cggcctcagg 
10321 gagtggcacc gaggtcagtc ccgcgccgac 
10381 cacccggacg tcgtggccgg cggtgtgcag 
10441 gtgggtacgg tgcgcgaacg aggtgagcag 
10501 gagggcggca acggtccggt cgatgccctc 
10561 cagcgtccgg aactcggtgg agtcgaagtc 



tacgtctact gcgccagggg tagggcgatg 
ccgaccttcg gccgggccga gccggtcgag 
cttcccgtgg gcatgggcca cctgttcgtc 
ctgatgtccg ccggttacgt ccccgacaag 
ctggcgttgc cgatcccggc cgacctcgac 
cccaccctcc gggaggcccg ggaccagggg 
gccgcgcacc gggtggtgcg gacgtgaccc 
cggcgcgtcg ggtttcctgg gttcggcggt 
ggtgcggctc gtcgcccggc gggaggtcgt 
gacgcaccgg gtggacctca ccgaacccgg 
ggcggtcttc ccgttcgccg cccagatcag 
cgacgtggtc gccgaacgga cgaacgtcgg 
ccgctcgccg cacgccccgg tggtggtctt 
caccgccggc cgggtcatcg acggcagcga 
gcagaaacac accggggaac agctgctcaa 
gaccagtctg cggctgcccc cggtgttcgg 
gggggtggtc tccaccatga tccgtcgggc 
cgacggcacc gtccggcgtg aactgctgta 
cgccctggac cacgccgacg cgctcgccgg 
ctggccgctg ggcgaggtct tccaggcggt 
ggacccggtg ccggtggtct cggtgccgcc 
cagcgtggag gtcgaccccg cccggttcac 
gatggcggag gcggtcgacc ggacggtggc 
gtccgagccc tcctgaccgg ggtcacccgg 
ccggtgccgg gaagatcgct tcgagttccc 
cggcccgtaa cgccgagtcg agctgctcgg 
tcccggggcg ggacaggacc caggccagac 
ggcagtagtc ctcgtacgcc tcgacgaggg 
gtccctgcgc cgacttgacg gcggttccgg 
gcccgccgtg caggggggac caggcgaaca 
ggacgtccag ctcggggtgg cggacggcca 
cgagcaggtt gcggcgtgcc gcgctctcct 
tggaggagcc gacgtacccg accttcccac 
cctcgtccca cggtgcggcg cggtcgatgt 
ccccgaggcg gcggagggag ttctcgcagg 
cgtcgttgac ccgttcgctc atctcgctgc 
gtcgacctcc gccctgggcg aaccaccgtc 
gccgccagcc gtagatgtcg gcggtgtcga 
ccatcagccg cagcgcgtcg tcgtcggtca 
agagtcggct ggtgtgcaac gccgatcgtc 
tggttcccac gtcggtcacc tgtcggcgcg 
tacgacctcg gcgggggtcg gcgcggccag 
ggcgtgggaa cggtcctcga ccactgtggc 
ctcgtccgga cggaggaaga cacccgctcc 
acagtcccac tcgtgggcga cggagatctg 
gcttccggca ccgccgtggt ggatgacggc 
aacgaagtcc accaggcgga cgttgtccgg 
caccacgatc tcgccgtcga accgcgcgag 
gttcgaggtg atgcccagcg ccgagtatcc 
cgaggtcctg agccactgcg gcacgacgga 
caccgactcc agtccggtct ccaggcggaa 
tccgacagcg aggtcctcgc tgtagtcgag 
gccgccgagc gggtccggcc ggtcgtcggc 
gctgcggaag tagccggtga ggtcgctgcc 
ggccttggcc gcgaccgccc cggcgaaggt 
ccagtccatg gcgaactcga cgagttcgtc 
gacgaaccgg gaggtggcct cctcgatgcc 
tccgcgtcgg gcgaagtcca ggtcggtggt 
ggagatgtcg aagagtcggt ggtccgagcc 
gacgacgtcg gtgagctcgg gctgactggc 
cgcccaggcc agggggacga ggccctggaa 
gacccgcact ggtcactcct tggtcgagat 
ggccagcggc acccgggggt gccagccggt 
gtcgctgcgg aagtcgttgg cctcggcgtt 
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10621 ctccggtgga gggacgctga cgacgggcac cgcagggttg ccggtctgac gtgccacgct 
10681 ggcggcgacg gtctcgaaga tctcgccgag gggtcgggcc tcgtccgcgc tcggcgtcca 
10741 gacgtcgccg accagcgcct cgtggttgtg cagtgcggcg gtgaacgcgg tggccacgtc 
10801 ctcgacgtgc aggaggttgc ggcgcacgct gccctcgtgc cacatcgtga tcggctcacc 
10861 ggcgagggct cgccggatca tggcggtgac gacaccccgg ccggtctgcc ccgacgggcc 
10921 gctgtggccg tagatcgcgg gcaggcgcag gatcaccccg tcgacgaccc cgtcctcggt 
10981 ggcctgacgc aggatccgct cggcctcgat cttgtgctgg gcgtaccggc tgggggcggc 
11041 ggggttcgcg gcctgggtgg tgctggcgaa caggagcacc ggcgcgggtc cgggtcttgc. 
11101 ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 
11161 cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 
11221 gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 
112 81 ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 
11341 tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 
11401 gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 
11461 tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 
11521 cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 
115 81 tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 
11641 gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 
11701 ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt 

117 61 ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 
11821 ccgtgccgac cagggtcggt ccgtcgccga ggcgggtcac cgtcgggtgg acccggtccg 

118 81 ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 
11941 tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 
12001 tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 
12061 agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 
12121 tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 
12181 gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 
12241 ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 
12301 gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 

123 61 gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 
12421 ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 

124 81 gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 
12541 ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 
12601 gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 
12661 cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 
12721 gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 
12781 ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 
12 841 ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 
12 901 tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 

12 961 gtcaacggtc cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 

13 021 gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 
13081 tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 
13141 ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 
13201 ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 
13261 cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 
13321 gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 
13381 ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 
13441 ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 
13501 tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 
13561 gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 
13621 gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 
13681 gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 
13741 tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 
13801 ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 
13861 gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 
13921 ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 
13981 gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 
14041 ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 
14101 ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 
14161 gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 

14 221 ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 
14281 accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 
14 341 ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 
14401 cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 
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14461 acaccgggca tgctcgtgga cttcagtcgg 
14521 aaggcgttct cggccgccgc cgacgggttc 
14581 ctggaacggc tctcggacgc ccgccgccac 
14641 accgctgtca actccgacgg cgcgagcaac 
14701 gtccgggtga tccgacaggc cctcgccgag 
14761 gtggagaccc acggcaccgg cacccgcctc 
14821 gacgcgtacg gcggtgaccg tgagcacccg 
14881 gggcacaccc aggccgccgc cggtgtcgcc 
14941 gccggtgtcc tgccccgcac cctgcacgcc 
15001 tcgggcgcga tcagcctgct ccaggagccc 
15061 cgggccgggg tgtcctcgtt cggcatcagc 
15121 gcgccgccga ccggtgacga cacccgaccc 
15181 ctctcggcga gcaccggcga ggcgttgcgc 
15241 cgcgagcacc ccgaccagga cctggacgac 
15301 gcgctggcgt accgtagtgg gttcgtgccc 
15361 gacgaactcg ccgccggtgg atccggggac 
1S421 cgcgtcgtct tcgtcttccc cggccaggga 
15481 ctcgacggcg acccggtctt cgcctcggtg 
15541 tacctggact tcgagatcgt cccgttcctg 
15601 cacacgctct ccaccgaccg cgtcgacgtg 
15661 tccctggcgg cccggtggcg ggcgtacggg 
15721 cagggggaga ttgccgcggc gtgtgtggcc 
15781 gcggtggccc tgcgcagccg ggtcatcgcc 
15841 atcgccgcct ccgtcgacga ggtggcggcc 
15901 gtcaacggtc cgcgcgcggt ggtggtctcc 
15961 gcctcctgca ccgtcgaggg ggtgcgggcc 
16021 tcctcgcacg tcgaggccgt ccgtgacgcg 
16081 ctgccgggct tcgtgccgtt ctactcgaca 
16141 ctcgacgccg ggtactggtt tcgcaacctg 
16201 cgctccctcg ccgaccaggg gtacacgacg 
16261 accacggcga tcgaggagat cggtgaggac 
16321 ctgcgacgtg gggccggcgg tcccgtcgac 
16381 gccggcgtcg cagtggactg ggagtcggcg 
16441 ctgcccacgt acccgttcca gcgtgagcgc 
16501 gtcgccgact ccgacgacgt ctcgtccctg 
16561 ccgggtgagc cgggacggct cgacggcacc 
16621 gacgaccggg tcgaggcggc gcggcaggcg 
16681 ctggtggtgg agccccggac gggccgggtc 
16741 ccggtggcgg gcgtgctctg cctgttcgct 
16801 ctggcggtga cgtcgttgtc ggacacgctc 
16861 cgggagtgtc cgatctgggt ggtcaccgag 
16921 ctccgcgacc cggcccacgg cgcgctctgg 
16981 cccgccgtct ggggcggcct ggtcgacgtg 
17041 cacctcggga cgaccctgtc cggcgccggc 
17101 acgtacgccc gccggtggtg cagggcgggc 
17161 ggcacggtgc tcgtcaccgg cggcaccggc 
17221 gcccgccagg gcaccccgtg cctggtgctg 
17281 gtcgaggagc tactcaccga actcgccgac 
17341 gacgtcaccg accgggagca gctccgtgcc 
17401 ctgtcggcgg tgttccacgt cgccgcgacg 
17461 ggtgaccgca tcgaacgggc caaccgggcg 
17521 ctgacccggg acgccgacct cgacgcgttc 
17581 ggcgcgccgg ggctcggcgg ctacgtcccg 
17641 cagcgacgca gcgagggact cccggccacc 
17701 gggatggccg agggtccggt cgccgaccgg 
17761 cccgaccagg ccgtcgaggg tctccgggtg 
17821 gtcgtcgaca tcaggtggga ccggttcctc 
17881 ctcttcgaca ccctcgacga ggcccgtcgg 
17941 gtggcggcgc tggccgggct gcccgtcggg 
18001 cggacgcacg cggctgccgt cctcggccac 
18061 gccttcgccg aactcggcgt cgactcgctg 
18121 actgcgaccg gggtccggct ggccacgacg 
18181 ctggccggac acctggccgc cgaactgggc 
18241 gaggccccga cggtggcccc gaccgacgag 



atgaactccc tcgcccccga cggacggtcc 
ggcatggccg aaggcgcagg gatgctcctg 
ggccacccgg tgctcgccgt gatcaggggc 
ggactctccg ccccgaacgg ccgggcccag 
tccgggctga cgccccacac cgtcgacgtc 
ggtgatccga tcgaggcacg ggcgctctcc 
ctgcggatcg gctcggtcaa gtccaacatc 
ggtctgatca aactggtgtt ggcgatgcag 
gacgagccgt caccggagat cgactggtcc 
gctgcctggc ccgccggcga gcggccccgc 
ggcaccaacg cacacgcgat catcgaggag 
gaccggatgg gcccggtggt gccctgggtg 
gcccgggcgg cgcggctggc cgggcaccta 
gtcgcctact cgctggccac cggtcgggcc 
gccgacgcgt ccacggcgct gcggatcctc 
gcggtgaccg gcaccgcccg cgccccgcag 
tggcagtggg cggggatggc agtcgacctg 
ctgcgggagt gcgccgacgc gttggaaccg 
cgggccgagg cgcagcgccg gacccccgac 
gtccagccgg tgctgttcgc ggtgatggtg 
gtggaaccgg cggccgtcat cggacactcc 
ggggcgctct cgctggacga cgcggcccgg 
accatgcccg gcaacggcgc gatggcctcg 
cggatcgacg ggcgggtcga gatcgccgcc 
ggcgaccgtg acgacctgga ccgcctggtc 
aagcggctgc cggtggacta cgcgtcgcac 
ctccacgccg aactcggcga gttccggccg 
gtcaccggcc gctgggtcga gcccgccgaa 
cgccacaggg tccggttcgc cgacgcggtc 
ttcctggagg tcagcgccca cccggtgctc 
cgtggcggtg acctcgtcgc tgtccactcg 
ttcggctccg cgctggcccg cgccttcgtg 
taccagggtg ccggggcgcg tcgggtgccg 
ttctggttgg aaccgaatcc ggcccgcagg 
cggtaccgca tcgaatggca cccgaccgat 
tggctgctgg cgacgtaccc cggtcgggcc 
ctggagtccg ccggggcgcg ggtcgaggac 
gacctggtgc ggcggctcga cgccgtgggt 
gtcgcggagc cggcggccga acactccccg 
gacctgaccc aggcggtggc cgggtcgggc 
aacgccgtcg ccgtcgggcc cttcgaacgg 
gccctcggtc gggtcgtcgc cctggagaac 
ccgtcgggtt cggtcgccga gctgtcgcgt 
gaggaccagg tcgccctccg acccgacggg 
gcgggcggca cgggccggtg gcagccccgg 
ggggtcggtc ggcacgtcgc ccggtggctg 
gccagccgcc ggggaccgga cgccgacggg 
ctgggcaccc gggccaccgt caccgcctgc 
ctcctcgcga ccgtcgacga cgagcacccg 
ctcgacgacg gcaccgtcga gaccctcacc 
aaggtgctcg gtgcccgcaa cctgcacgag 
gtgctcttct cctcctccac cgccgcgttc 
ggcaacgcct acctcgacgg tctcgcccag 
tcggtggcgt ggggtacctg ggcgggcagc 
ttccgccggc acggggtcat ggagatgcac 
gcactggtgc agggtgaggt agccccgatc 
ctcgcgtaca ccgcgcagcg ccccacccgg 
gccgcgcccg gtcccgacgc cgggccgggg 
gaacgcgaga aggcggtcct cgacctggta 
gcctcggccg agcaggtgcc cgtcgacagg 
tcggccctgg aactgcgcaa ccggctgacc 
acggtcttcg accacccgga cgtacggacc 
ggcggatcgg ggcgggagcg gcccgggggc 
ccgatcgcca tcgtcgggat ggcctgccgg . 
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18301 ctgccggggg gagtggactc accggagcag 
18361 accgcctcgg cggcacccgg ggaccggagc 
18421 acgacgggca cccgtaccgc cttcggcaac 
18481 gcgttcttcg ggatctcgcc gcgtgaggcg 
18541 ctggagacca cctgggaggc gctggagaac 
18601 acggacaccg gtgtcttcgt gggcatgtcc 
18661 cccgaggacg aggtcgacgg ctacctgttg 
18721 cggatcgcgt acgtgttggg gttggagggg 
18781 tcgtcgcttg tggcgttgca cgtggcggcg 
18841 gcggtggcgg gtggggtgtc ggtgatggcc 
18901 cagggcgcgt tggctccgga cggcaggtgc 
18961 ggtctggggg aggggtcggc cttcgtcgtg 
19021 gggcgtcggg tgttgggtgt ggtggtgggt 
19081 gggttggcgg cgccgtcggg ggtggcgcag 
19141 gcgggtgtgt cgggtgggga tgtgggtgtg 
19201 ggggatccgg tggagttggg ggcgttgttg 
19261 ggtccggtgg tggtgggttc ggtgaaggcg 
19321 gtggtgggtg tgatcaaggt ggtgttgggg 
19381 tgtcggggtg ggttgtcggg gttggtggat 
19441 ggggtgcggg ggtggccggt gggtgtggat 
19501 ggggtgtcgg ggacgaatgc tcatgtggtg 
19561 gcggaacggc cggtggaggg gtcgtcgcgg 
19621 ccggtggtgc tgtcggcaaa gaccgaaacc 
19681 gaccacctgg agacgcaccc cgacgtcccg 
19741 gcccgccaac gcttcgacag gcgcgcggtc 
19801 gaacggctgc gcggcctcgc cgggggcgaa 
19861 tcgggtggtg gtgtggtgtt tgtttttcct 
19921 cgggggttgt tgtcggttcc ggtgtttgtg 
19981 tcgtcggtgg tggggttttc ggtgttgggg 
20041 ttggatcggg tggatgtggt gcagccggtg 
20101 ttgtggcggt ggtgtggggt tgtgcctgcg 
20161 gcggcggcgg tggtggcggg ggtgttgtcg 
20221 cgggcgcggg cgttgcgggc gttggccggc 
20281 cgcgacgacg tacagaagct cctcgacagc 
20341 gcggtcaacg gccccgacgc ggtggtggtc 
20401 gtcgagcact gtgacgggat cggggtccgg 
20461 cactccgcac aggtcgagtc gctccgggag 
20521 ggccgcccgg cgacggtgcc gttctactcc 
20581 gaactggacg ccgactactg gtaccgcaac 
20641 gtcgaggcgc tggcagcgcg tgacctcacc 
20701 ctgtcgatgg cggtcgggga gacgcttgcc 
20761 ctggaacgcg acaccgacga cgtcgagcgc 
20821 cacggcgtac ccgtggactg ggcggcggtc 
20881 acctatccct tccagggacg gcggttctgg 
20941 gtcgccgact ggttccaccg ggtcgactgg 
21001 ctcgacggtc gctggctggt ggtcgtaccc 
21061 gaggtgcggg ccgccctcgc cgccggtggt 
21121 gtcaccgacc gggtcggtga cagcgacgcg 
21181 ggtgcggccg agaccctggc gctgctgcga 
21241 ctgtgggtgg tcaccgtggg ggccgtcgcc 
21301 gcgacggtgt gggggttggc ccttgtcgcc 
21361 ctgctggatc tgccgcagac accggacccg 
21421 gccggtgccg aggaccaggt agcggtccgc 
214 81 cccaccccgg tcaccggagc cgggccgtac 
21541 gggggcaccg ccggtctggg tgccgtcacc 
21601 cacctcgccc tggtcagccg gcgcgggccg 
21661 gacctgaccg ggctcggcgt acgggtgtcg 
21721 tcggtcggcg ccctggtgca ggagttgaca 
21781 cacgctgccg gtctgcccca gcaggtgcca 
21841 gacgtggtgg ccgtgaaggt cgacggcgcg 
21901 gaactgttcc tgctgttctc ctccggggcc 
21961 tacgccgccg gaaacgcctt cctggacgcc 
22021 cccgccacct cggtggcgtg ggggctctgg 
22081 gcggtgtcgt tcctgcgtga gcggggcgta 



ctgtgggagt tgatcgtctc cgggcgggac 
tgggatccgg cggagttgat ggtctccgac 
ttcatgcccg gggcgggcga gttcgacgcg 
ttggcgatgg atccgcagca gcggcacgcc 
gccggtatcc ggcccgagtc gttgcgcggt 
catcaggggt acgccaccgg ccgcccgaag 
acaggcaaca ccgcgagcgt cgcctccggt 
ccggcgatca ctgtggacac ggcgtgttcg 
ggttcgttgc gttctgggga ctgtggtctg 
ggtccggagg tgttcaggga gttctcccgg 
aagcccttct cggacgaggc cgacggcttc 
ttgcagcggt tgtcggtggc ggtgcgggag 
tcggcggtga atcaggatgg ggcgagtaat 
cagcgggtga ttcggcgggc gtggggtcgt 
gtggaggcgc atgggacggg gacgcggttg 
gggacgtatg gggtgggtcg gggtggggtg 
aatgtgggtc atgtgcaggc ggcggcgggt 
ttgggtcggg ggttggtggg tccgatggtg 
tggtcgtcgg gtgggttggt ggtggcggat 
ggggtgcgtc ggggtggggt gtcggcgttt 
gtggcggagg cgccggggtc ggtggtgggg 
gggttggtgg gggtggttgg tggtgtggtg 
gccctgcacg cccaggcacg tcgactcgcc 
atgaccgacg tggtgtggac gctgacgcag 
ctcctcgccg ccgaccggac ccaggccgtg 
ccggggaccg gtgtggtgtc gggggtggcg 
ggtcagggtg gtcagtgggt ggggatggcg 
gagtcggtgg tggagtgtga tgcggtggtg 
gtgttggagg gtcggtcggg tgcgccgtcg 
ttgttcgtgg tgatggtgtc gttggcgcgg 
gcggtggtgg gtcattcgca gggggagatc 
gtgggtgatg gtgcgcgggt ggtggcgttg 
cacggcggca tggcctcggt acgccgaggc 
ggcccctgga cggggaagct ggagatcgcc 
tccggcgacc cccgagccgt gaccgagctg 
gcccggacga tccccgtcga ctacgcctcc 
gagctgctct ccgtcctggc cgggatcgag 
accctcaccg gtgggttcgt cgacggcacc 
ctgcgccacc cggtgcggtt ccacgccgcc 
acgttcgtcg aggtcagccc gcaccccgtg 
gacgtggagt ccgccgtcac tgtgggcacc 
ttcctcacct ccctcgccga ggcgcacgtc 
ctcggctccg gaaccctggt cgacctgccc 
ctgcaccccg accgtggtcc gcgtgacgat 
acggcgacgg ccaccgacgg gtcggcccga 
gaggggtaca cggacgacgg ctgggtcgtg 
gccgagccgg tggtgacgac ggtcgaggag 
gtggtgtcga tgctcgggct ggccgacgac 
cgactcgacg cacaggcgtc caccacccca 
cccgccggtc cggtgcagcg ccccgaacag 
tccctggaac gcggacaccg gtggaccggc 
cagctacgac cccggctggt cgaggcgctc 
gccgacgccg tacacgcccg tcggatcgtc 
accgccccgg gcgggacgat cctcgtcacc 
gcccgatggc tcgccgagcg cggtgccgaa 
ggcaccgccg gcgtcgacga ggtggtccgg 
gtgcactcct gcgacgtcgg cgaccgcgag 
gcagccggtg acgtggtccg gggggtggtc 
ctgaccgaca tggacccggc cgacctcgcc 
gtgcacctgg ccgacctgtg cccggaggcc 
ggggtgtggg gcagtgcccg tcagggtgcg 
ttcgcccgac accggcggga ccggggtctg 
gcggccgggg ggatgacagg ggaccaggag 
cggccgatgt cggtgccgag ggcactggaa. 
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22141 gcgctggaac gggtcctcac cgccggggag 
22201 gcggccttcg ccgagtcgta cacctccgcc 
22261 acacctgcgg cggcggtcgg cgagcgcgac 
22321 ctggcggccc tgccccgggc cgagcggtcg 
22381 gccgcagccg tgctcggcag cgacgcgaag 
22441 ctcgggttcg actcgctggc cgcggtccgg 
22501 ctgcgtctgc cggccaccct ggtcttcgag 
22561 ctccacgacc gactcggcga ggccggcgag 
22621 ctggccgcgc tggagcaggc cctgcccgac 
22681 gagcgcctgg aacggatgct cgccgggctc 
22741 ccgaccgccg gtgacgacct gggggaggcc 
22801 cgggaactcg acgccaggtg aacccgaact 
22861 ggacctgtga ctgacaacga caaggtggcg 
22 921 cgggccgccc gcaagcgcct gcgcgagctg 
22981 gcctgccgcc taccgggcgg ggtgcacctc 
23041 gggcacgaga cggtgtccac cttccccacc 
23101 cacccggacc ccgaccaccc cggcaccagc 
23161 gtggcgggct tcgacgccga gttcttcggg 
23221 ccgcaacagc ggctgctgtt ggagaccagt 
23281 ccgcactccc tgcgtggcac cccgaccggc 
23341 ggcgagaacg gcaccgaagc cggtgacgcc 
23401 gctgtcgcct ccgggcggat ctcctacgcc 
23461 gacaccgcgt gctcgtcgtc gttggtggcg 
23521 ggcgagtcga gtctcgctgt cgtcggcggg 
23581 gtcgacttca gccgccagcg ggcgttggcc 
23641 gccgccgacg ggttcggctt ctccgagggg 
23701 gaggccgaaa gcaacggcca cgaggtgttg 
23761 gacggggcca gcaacggtct cgccgcgccg 
23821 caggcgctac gaaactgcgg cctgaccccg 
23881 accggcacca cgctcggcga cccgatcgag 
2 3941 gaccgggatc cggaccaccc gctgtggctg 
24001 caggcggcgg cgggcgtcac cgggctgctc 
24061 ctgcccgcca ccctgcacgt cgacgagccc 
24121 gtacgcctgg cgacccgggg ccggccgtgg 
24181 gtgtcggcgt tcggcatcag cgggaccaac 
24241 cggaccaccg agcgcaccgt cggcggcgac 
24301 cggtcggcgg cggcgctacg ggcccaggcg 
24361 gacgtcgggc tggcggaggt cgggcggagc 
24421 cgggcggcgg tggtggcgtc gacccgggcc 
24481 gcggtcgaac cgcgcggcga ggacaccgtc 
24541 gtcgtcttcc tcttcccggg acaggggtcc 
24601 gactcggcac cggcgttcgc cgacacgatc 
24661 caggactggt cggtctccga cgtgctccgg 
24721 gtcgacgtgg tgcagccggt gctgttcgcg 
24781 tcgtacgggg tcacccccgc tgcggtggtg 
24841 cacgtggcgg gtgcgctctc cctcgccgac 
24901 ttgctgcggt cgctgtccgg gggcggcggc 
24961 gtacgccgcc gactgcggtc gtgggaggac 
25021 cggtcggtgg tggtggccgg ggaaccggag 
25081 gccgagggcg tacgggtccg cgagatcgac 
25141 gacagggtcc gtgacgaact cctgacggtc 
25201 atcaccttct actcgacggt cgacgtccgt 
25261 tactggtacc gcaacctgcg ggagacggtc 
25321 gactcgggat acgacgcgtt cgtcgaggtc 
25381 gccgaggcgg tcgaggaggc aggtgtcgag 
25441 ggcgacggcg gaccgggggc gttcctgcgg 
25501 gacgtcgact ggacgcccgc cctcccggga 
25561 ttccaacgga agccgtactg gctgcggtcg 
2 5621 gcctaccggg tgtcctggac gccgatcacc 
25681 tggctggtgg tgcaccccgg gggcagcacc 
25741 accgccggcg gtggccgggt cgtcgcccac 
25801 ctggccgagg cgctcgcccg gcgggacggc 
2 5861 accgacgaac ggcacgtcga ggccggtgcg 
25921 ggtgacgccg gaatcgacgc accactgtgg 



accgcggtgg tcgtcgccga cgtcgactgg 
cggccccggc cgctgctcca ccggctcgtc 
gagccgcgtg agcagaccct ccgggaccgg 
gcggagctgg tacgcctggt ccggcgggac 
gccgtacccg ccaccacgcc gttcaaggac 
ttccgtaacc ggctggccgc ccacaccggt 
cacccgaacg ccgcagccgt cgccgacctc 
ccgacccccg tccggtcggt gggcgccgga 
gcctccgaca cggagcgggt cgagctggtc 
cgccccgagg ccggagccgg ggccgacgcc 
ggcgtcgacg aactcctcga cgcgctcgaa 
gaccgcagcc gcagccgaag cagagaccga 
gagtacctcc gtcgtgcgac gctcgacctg 
caatccgacc cgatcgcggt cgtcggcatg 
ccgcagcacc tgtgggacct cctgcgccag 
gggcgcggct gggacctggc cgggctcttc 
tacgtcgacc ggggtgggtt cctcgacgac 
atctccccgc gcgaggccac ggccatggac 
tgggagctgg tggagagcgc cggcatcgat 
gtcttcctcg gcgtggcgcg gctcggctac 
gagggctatt cggtgaccgg ggtggcaccc 
ctcgggctgg agggtccgtc gatcagcgtg 
ctgcacctgg cggtcgagtc gctgcggctg 
gcggcggtca tggcgacacc aggggtgttc 
gctgacggca ggtcgaaggc cttcggggcc 
gtctccctcg tcctgctcga acggctctcc 
gctgtcatcc gtggctccgc cctcaaccag 
aacgggaccg cccagcgcaa ggtgatccgg 
gccgacgtgg acgccgtgga ggcgcacggc 
gccaacgccc tgctggacac ctacggccgt 
gggtcggtga agtcgaacat cggccacacg 
aagatggtgc tggcactgcg ccacgaggaa 
accccgcacg tggactggtc ctcgggagcg 
cggcggggtg accggccgag gcgggccggg 
gcccacgtga tcgtcgagga ggcacccgag 
gtcggcccgg tcccgctcgt ggtgtccgcc 
gcccaggtcg ccgagctggt ggagggctcc 
ctggccgtga cccgggcgcg acacgagcac 
gaggcggtgc gggggctgcg cgaggtcgcg 
accggggtcg ccgagacgtc cgggcgcacc 
cagtgggtcg ggatgcjgcgc ggagctgctg 
cgcgcctgcg acgaggcgat ggcaccgttg 
caggagccgg gggcaccggg actggaccgg 
gtgatggtgt cgttggcgcg gttgtggcag 
gggcactcgc agggggagat cgccgccgcc 
gcggcgaggc tggtggtggg ccgcagccgg 
atgagcgccg tcgcgctcgg tgaggccgag 
cggatctccg tggccgccgt caacggaccc 
gcgctgcggg agtggggacg ggagcgggag 
gtcgactacg cctcgcactc gccgcagatc 
acgggggaga tcgagccccg gtcggcggag 
gctgtcgacg gcaccgacct ggacgcgggg 
cggttcgccg acgcgatgac ccggttggcc 
agcccgcatc cggtggtggt gtcggcggtc 
gacgccgtcg tcgtcggcac cctgtcccgg 
tcggcggcca ccgcccactg cgccggtgtg 
gctgcgacga tcccgttgcc gacgtacccg 
tctgctcccg cccccgcctc ccacgatctc 
ccgcccgggg acggcgtact cgacggcgac 
ggatgggtcg acgggttggc ggcggcgatc 
ccggtggact ccgtgacctc ccggaccggc 
acgttccggg gggtgctgtc gtgggtggcg 
gtcgccctgc tgaccctggc gcaggcgttg 
tgcctgaccc aggaggcggt ccgtaccccc 
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25981 gtcgacggtg acctggcccg accggcgcag 
26041 cggctggagc tggcccgccg cttcggtggg 
26101 gccgggacgc gtctggtcgc ggcggtcctc 
26161 cgtggcgacc gtctctacgg ccgtcgcctg 
26221 gggttcaccc cgcacggcac cgtcctggtc 
26281 ctggcccggt ggctcgccga acggggtgcc 
26341 ggcgaggagt tgctgaccgc gatccgggcc 
26401 gaggcggagg cactgcgtac ggcgatcggc 
26461 gagacgttga cgaacttcgc cggcgtcgcc 
26521 gtcgcggcga agaccgcgct gccgacggtc 
26581 gaacgggagg tctactgctc gtcggtggcc 
26641 tacgccgccg gcagcgccta cctcgacgcc 
26701 gccagcgcct cggtggcctg gaccccgtgg 
26761 ctgcgcgagc gcggcctgcg cagcctcgac 
26821 ctgctccgcg ccggtgcggt gtcggtggcc 
26881 gagggtttcg cggccatccg gccgaccccg 
26941 gaccccgacg gcgcgcccgt cgaccggccg 
27001 atcgcggcgc tgtccccgca ggaacagcgg 
27061 gtcgcggagg tgctgggaca cgagaccggc 
27121 gaactcggcc tcgactcgct gggctcgatg 
27181 ggcctgcgga tgccggcctc gctggtcttc 
27241 tacctgcgtc gactggtcgt cggggactcc 
27301 accgacgagg ccgaacccgt cgccgtggtc 
27361 gccacccccg aggacctctg gcgggtggtg 
27421 cccaccgacc ggggctggga cctccggcgg 
27481 accagctacg tcgacagggg gggattcctc 
27541 ttcgggatca ccccccgcga ggcgctggcg 
27601 atcgcgtggg aggcggtgga acgggcgggc 
27661 accggcgtct tcgtcggcat gaacggccag 
27721 gaccggctca acggctacca ggggttgggc 
27781 gcctacacct tcgggtggga ggggccggcg 
27841 ctggtcgcca tccacctcgc catgcagtcg 
27901 gccggcgggg tgacggtcat ggccgacccg 
27961 gggotcgccg ccgacgggcg gtgcaaggcg 
28021 gccgagggcg tcgcggcgct cgtcctcgaa 
28081 caggtgctgg cggtgctgcg cggcagcgcc 
28141 gccgccccga acgggccgtc gcaggaacgg 
282 01 ctgcgtcccg ccgacgtcga catggtggag 
28261 ccgatcgagg ccggggcgct catcgcggcg 
28321 ctgggctcgg tgaagacgaa catcggccac 
28381 atcaaggcgg tcctggcgat gcggcacggc 
28441 ttgtccccgc acatcgactg ggcggacggg 
28501 tggccccccg gtgagcgccc ccgccgcgcc 
28561 aacgcccacg tcatcgtcga ggaggcaccc 
28621 gccccgggcg ggcccctgcc cttcgtcctg 
2B681 caggcgcgga ccctcgccga acacctgcgc 
28741 gcccgtaccc tggccaccgg tcgcgcccgt 
28801 gaccgggagg gtgtctgcgc cgccctcgac 
28861 gtcgtcgccc cggcggtctt cgccgcccgt 
28921 tcgcagtggg tcggcatggc ccgtgacctg 
28981 atgggccggt gcgccgaggc gctgtcgccg 
29041 cgtggggtcg gcgaccccga cccgtacgac 
29101 gcggtgatgg tgtcgctggc gcggttgtgg 
2 9161 gtgggtcact cgcaggggga gatcgccgcc 
29221 gacgccgcca gggtggtggc gttgcgcagc 
29281 ggcatggtgt cggtcggcac ctcccgcgcc 
29341 gggcgggtcg cggtggcggc ggtgaacgga 
29401 gccgaactgg acgagttcct cgcggtggcc 
29461 gcggtgcgct acgcgtcgca ctccccggag 
29521 gaactcggca ccgtcaccgc cgtcggcggc 
29581 gacctcctcg acaccacagc catggacgcc 
29641 gtgctgttcg agcacgccgt ccgcagcctc 
29701 gtcagcccgc accctgtgct gctgatggcg 
29761 ccggtcaccg gcgtgccgac gctgcgccgc 



gccgccctgc acggtttcgc ccaggtcgcc 
gtgctcgacc tgcccgccac cgtcgacgcfc 
gccggcggcg gcgaggacgt cgtcgccgtc 
gtcagggcga ccctgccgcc gcccggcggg 
accggcgcgg ccggtccggt gggcggtcgg 
acccgactcg tcctgcccgg cgcacacccg 
gccggtgcca ccgccgtggt gtgcgaaccg 
ggggagttgc cgaccgcgct cgtacacgcc 
gacgccgacc ccgaggactt cgccgccacc 
ctggcggagg tgctcggcga ccaccgcctc 
ggggtctggg gtggggtcgg catggccgcg 
ctggtcgagc accgtcgcgc ccgggggcac 
gccctgcccg gcgcggtcga cgacggtcgg 
gtggccgacg ccctcgggac gtgggaacgt 
gtcgccgacg tcgactggtc ggtcttcaca 
ctcttcgacg aactcctcga ccggcgcggg 
ggggagccgg cgggcgagtg gggtcgacga 
gagacgttgc tgaccctcgt cggcgagacg 
accgagatca acacccgtcg ggccttcagc 
gccctgcgtc agcgcctggc ggcccgtacc 
gaccacccga cggtcaccgc gctcgcgcgg 
gacccgaccc cggtacgggt gttcggcccc 
ggcatcggct gccggttccc cggcggcatc 
tccgagggca cctccatcac caccggattc 
ctctaccacc ccgacccgga ccaccccggc 
gacggggccc cggacttcga ccccgggttc 
atggacccgc agcagcggct caccctggag 
atcgacccgg agaccctcct cggcagcgac 
tcctacctgc aactgctgac cggggagggt 
aactcggcga gcgtgctctc cggccgtgtc 
ctgacggtgg acaccgcctg ctcgtcctcg 
ctgcgtcggg gtgagtgctc gctggcgttg 
tacaccttcg tggacttcag cgcacagcgg 
ttctccgcgc aggccgacgg gttcgccctc 
ccgttgtcca aggcgcggcg aaacggccac 
gtcaaccagg acggggccag caacggcctc 
gtgatcaggc aggccctgac cgcctccggg 
gcgcacggga cgggcaccga actcggcgac 
tacggccggg accgggaccg gccgctctgg 
acccaggccg ccgccggtgc cgccggggtg 
gtactcccga ggtcgctgca cgccgacgag 
aaggtcgagg tgctccgcga ggcacgacag 
ggggtgtcct ccttcggcgt cagcgggacc 
gccgaaccgg accccgaacc ggttcccgcc 
cacggacgca gcgtccagac ggtccggtcc 
accaccggcc accgggacct cgccgacacc 
ttcgacgtcc gggccgcagt gctcggcacc 
gcgctggcgc aggatcgccc ctcgcccgac 
acccccgtcc tggtcttccc cgggcagggg 
ctcgactcct ccgaggtgtt cgccgagtcg 
tacaccgact gggacctgct cgacgtggtc 
cgggtggacg tgctccagcc ggtgctgttc 
cagtcgtacg gggtgactcc gggtgcggtg 
gcgcacgtgg ctggtgcgtt gtcgttggcc 
cgggtgctgc gggagctcga cgaccagggc 
gagttggact cggtcctgcg ccggtgggac 
cccggcacgc tcgtggtggc cggacccacc 
gaggcccgcg agatgaggcc gcgtcggatc 
gtggcccggg tcgaacagcg gctcgccgcc 
acggtcccgc tctactccac cgccaccggg 
gggtactggt accgcaacct gcgccaaccg 
ctggagcggg gattcgagac gttcatcgag 
gtcgaggaga ccgccgagga cgccgagcgc 
gaccacgacg ggccgtcgga gttcctccgc 
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29821 aacctcccgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 
29881 ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 
29941 caccgcaggg ccgacacctc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 
30001 gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 
30061 gagcagcagt ggctgaccca gcacgtggtg. ggtgggcgga acctggtgcc cggcagtgtc 
30121 ctggtcgacc tcgcgcccac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 
30181 gtcctgcagc agccgctggc gttgaccgcc gccggtgcgc tgctgcgccc gtcggtcggc 
30241 gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 
30301 ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 
30361 ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccctgac gttgaccgac 
30421 cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 
30481 gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 
30541 gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc agaccttcgg cctgaccagt 
30601 cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 
30661 actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 
30721 gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 
30781 gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 
30841 gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 
30901 ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 
30961 gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 
31021 tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 
31081 gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 
31141 cgctgcgccc aggcggagtc cccggaccgc tccgtgctcg tcgacggcga cccggagacg 
31201 cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 
31261 cggctgacgc cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggctggtg 
31321 cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 
31381 cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 
31441 gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 
31501 gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 
31561 ctgttccagg gggccttcgg gccggtggcg gtcgccgacc accggctcct caccccggtc 
31621 cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 
31681 tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgctggt ccacgccgcc 
31741 gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 
31801 gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 
31861 atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 
31921 ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 
31981 ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc ggcggagcag 
32041 ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 
32101 atcctggagg aggtcgtcgg tctgctggcc gccggtgccc tcgaccggtt gccggtgtcg 
32161 gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 
32221 ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 
322 81 ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 
32341 ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 
324 01 gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 
32461 gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 
32521 cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 
32581 caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 
32641 gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 
32701 ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 
32761 ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 
32821 ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 
32881 gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 
32941 aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc 
33001 acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 
33061 ggtgcacccg agaccgatca ggtggccgcg ctggccgagc tggtccgccc gcacgcggcg 
33121 gcggtcgccg gctacgactc ggccgaccag ctgcccgaac gcaaggcgtt caaggacctc 
33181 gggttcgact cgctggcggc ggtggagctg cgcaaccggc tcggcgtcac caccggcgta 
33241 cggctgccca gcacgctggt gttcgaccac ccgacaccgc tggcggtggc cgaacacctg 
33301 cggtcggagt tgttcgccga ctccgcgccg gacgtcgggg tcggtgcgcg cctcgacgac 
33361 ctggaacggg cgctcgacgc cctgcccgac gcgcagggac acgccgacgt cggggcccgc 
33421 ctggaggcgc tgctgcgccg gtggcagagc cgacgacccc cggagaccga gccagtgacg 
33481 atcagtgacg acgccagtga cgacgagctg ttctcgatgc tcgacaggcg tctcggcggg 
33541 ggaggggacg tctaggtgac aggtcgattc cgccccgcgg cagtggaccg taccgccctg 
33601 acaggtccac cgggttcgcg tcgcctccca cacccgacgg ccggggtatc cacggaaggg 
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33661 atccgatgag cgagagcagc ggcatgaccg aggaccgcct ccggcgctat ctcaagcgca 
33721 ccgtcgccga actcgactcg gtgacaggtc ggctcgacga ggtcgagtac cgggcccgcg 
33781 aaccgatcgc cgtcgtcggc atggcctgcc ggttccccgg gggtgtggac tcgccggagg 
33841 cgttctggga gttcatccgc gacggtggtg acgcgatcgc cgaggcgccc acggaccgtg 
33 901 gctggccgcc ggcaccgcga ccccgcctcg gtggtctcct cgcggagccg ggcgcgttcg 
33 961 acgccgcctt cttcggcatc tcaccccgcg aggcgctcgc gacggacccc cagcagcgcc 
34021 tgatgctgga gatctcctgg gaggcgttgg agcgtgcggg tttcgacccg tcgagcctgc 
34081 gcggcagcgc cggtggcgtc ttcaccggtg tcggtgcggt ggactacgga cccaggccgg 
34141 acgaggcacc cgaggaggtg ctcggctacg tcggcatcgg caccgcctcc agcgtcgcct 
34201 ccggacgggt ggcgtacacc ctggggttgg agggtccagc cgtcaccgtc gacaccgcct 
34261 gctcctccgg gctcaccgcg gtgcacctgg cgatggagtc gctgcgccgc gacgagtgca 
34321 ccctggtcct cgccggtggg gtcaccgtga tgagcagccc gggtgcgttc accgagttcc 
34381 gcagccaggg cgggttggcc gaggacggcc gctgcaaacc gttctcccgc gccgccgacg 
34441 gcttcgggct cgccgagggg gccggggtcc tggtgctcca acggctgtcc gtcgcccggg 
34501 ccgagggccg gccggtgctg gccgtactgc gtggctcggc gatcaaccag gacggtgcca 
34561 gcaacgggct caccgcgccg agcggccccg cccagcggcg ggtgatcagg caggcgttgg 
34621 agcgggcgcg gctgcgtccc gtcgacgtgg actacgtgga ggcccacggc accggcaccc 
34681 ggctgggcga tccgatcgag gcgcacgccc tgctcgacac gtacggtgcc gaccgggaac 
34741 ccggccgccc gctctgggtc ggatcggtga agtccaacat cggtcacacc caggcggcgg 
34801 cgggggtggc cggggtgatg aagaccgtgc tggcgctgcg gcatcgggag atcccggcga 
34861 cgttgcactt cgacgagccc tcgccgcacg tcgactggga ccggggtgcg gtgtcggtgg 
34921 tgtccgagac ccggccctgg ccggtggggg agcgcccgcg ccgggcgggg gtgtcctcgt 
34981 tcggcatcag cggcaccaac gcgcacgtca tcgtcgagga ggcgccgagc ccgcaggcgg 
35041 ccgacctcga cccgaccccc ggcccggcaa ccggagcgac ccccggaacg gatgccgccc 
35101 ccaccgccga gccgggtgcg gaggcggtcg cactggtgtt ctccgcgcgc gacgagcggg 
35161 ccctgcgcgc ccaggcggcc cggctcgccg accgtctcac cgacgacccg gccccctcgt 
35221 tgcgcgacac cgccttcacc ctggtcaccc gccgtgccac ctgggagcat cgggcggtcg 
352 81 tcgtcggcgg gggcgaggag gtcctcgccg gcctccgggc cgtcgccggg ggacgtcccg 
35341 tcgacggagc cgtcagcggg cgggcgcgcg ccggccgccg ggtggtgctg gtcttccccg 
35401 ggcagggcgc acagtggcag ggcatggccc gggacctgct gcggcagtcg ccgaccttcg 
354 61 cggagtccat cgacgcctgc gagcgggcgc tcgccccgca cgtggactgg tcgctgcgcg 
35521 aggtgctcga cggcgagcag tcgttggacc ccgtcgacgt ggtgcagccg gtgctgttcg 
3 5581 cggtgatggt gtcgttggcg cggttgtggc agtcgtacgg ggtgactccg ggtgcggtgg 
35641 tgggtcactc gcagggggag atcgccgccg cgcacgtggc tggtgcgttg tcgttggccg 
35701 acgccgccag ggtggtggcg ttgcgcagcc gggtgctgcg ccgtctcggt ggtcacggcg 
35761 ggatggcgtc gttcgggctc caccccgacc aggccgccga gcggatcgcg cgcttcgcgg 
35821 gtgcgctgac tgtcgcctcg gtcaacggtc cccgttcggt ggtgctggcc ggggagaacg 
35881 gcccgttgga cgagctgatc gccgagtgcg aggccgaggg cgtgaccgcc cgtcggatcc 
35941 ccgtcgacta cgcctcacac tccccgcagg tggagtcgct gcgtgaggag ctgctcgccg 
36001 cactggccgg ggtccgtccg gtgtcggccg ggatccccct gtactcgacc ctgaccggtc 
36061 aggtcatxga aacggcgacg atggacgccg actactggtt cgccaacctc cgggagccgg 
36121 tgcgcttcca ggacgccacc aggcagctcg ccgaggcggg gttcgacgcc ttcgtcgagg 
36181 tcagcccgca "cccggtgttg acagtcggtg tcgaggccac cctcgaggca gtgctgcccc 
36241 ccgacgccga tccgtgtgtc acaggcaccc tgcgccgcga acgcggcggt ctcgcgcagt 
36301 tccacaccgc gctcgccgag gcgtacaccc ggggggtgga ggtcgactgg cgtaccgcag 
36361 tgggtgaggg acgcccggtc gacctgccgg tctacccgtt ccaacgacag aacttctggc 
36421 tcccggtccc cctgggccgg gtccccgaca ccggcgacga gtggcgttac cagctcgcct 
364 81 ggcaccccgt cgacctcggg cggtcctccc tggccggacg ggtcctggtg gtgaccggag 
36541 cggcagtacc cccggcctgg acggacgtgg tccgcgacgg cctggaacag cgcggggcga 
36601 ccgtcgtgtt gtgcaccgcg cagtcgcgcg cccggatcgg cgccgcactc gacgccgtcg 
36661 acggcaccgc cctgtccact gtggtctctc tgctcgcgct cgccgagggc ggtgctgtcg 
36721 acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 
36781 tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 
36841 cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 
36901 ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 
36961 tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 
37021 cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 
37081 tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 
37141 cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 
37201 acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 
37261 ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga cgggtggtca 
37321 cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 
37381 gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 
37441 gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 
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37501 ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 
37561 gcagtgggct gccggtcacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 
37621 gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 
37681 cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 
37741 tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 
37801 aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 
37861 ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 
37921 aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 
37981 gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 
38041 ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 
3 8101 cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 
3 8161 tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 
38221 tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 
38281 cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 
38341 ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 
38401 tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 
3 8461 ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 
38521 tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 
38581 cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 
38641 ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 
38701 gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 
38761 ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 
38821 ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 
38881 ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 
38941 gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 
39001 gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 
39061 gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 
39121 ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 
39181 gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 
39241 cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 
39301 tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 
39361 cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 
39421 cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 
39481 tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 
39541 tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 
3 9601 tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 
3 9661 gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 
3 9721 acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 
3 9781 cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 
3 9841 gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 
3 9901 cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 

3 9961 tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 
40021 tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 
40081 cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 

4 0141 gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 
40201 ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 
40261 actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 
40321 actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 
40381 cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 
40441 gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 
40501 acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 
40561 ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 
40621 ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 
40681 gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 
40741 ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 
40801 cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 
40861 actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 
40921 acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 
40981 ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 
41041 gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 
41101 aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 
41161 acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 
41221 agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 
41281 gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 
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41341 acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 
41401 ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 
41461 cgcggtacct ggcccggtcc ggggtgaccg atctcgccct gctcagcagg agcggccccg 
41521 acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 
41581 tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 
41641 aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 
41701 tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 
41761 tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 
41821 ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 
41881 accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 
41941 acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 
42001 cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 
42061 tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 
42121 tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 
42181 tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 
42241 tcggtcacca ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 
42301 actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 
42361 ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 
42421 agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 
42481 ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 
42541 agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 
42601 tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 
42661 ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 
42721 tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 
42781 gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 
42841 actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 
42901 cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 
42961 cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 
43021 cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 
43081 ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 
43141 gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 
43201 cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 
4 3261 agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 
43321 actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 
43381 cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 
43441 cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 
43501 cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 
43561 ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 
43621 gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 
43681 ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 
43741 gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 
43801 ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 
43861 caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 
43921 ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 
43981 ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 
44041 ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gtggtggtcg ccgccgccaa 
44101 ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 
44161 ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 
44221 gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 
44281 ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 
44341 ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 
44401 gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 
44461 caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 
44521 acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 
44581 gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 
44641 agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 
44701 tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 
44761 cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 
44821 tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 
44881 gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 
44941 cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 
45001 tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 
4 5061 cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 
45121 gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgcgc - 
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45181 tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 
45241 cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 
4 5301 cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 
45361 gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 
45421 aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 
45481 aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 
45541 ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 
45601 gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 
45661 cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 
45721 cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 
45781 cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 
45841 cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 
45901 cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 
45961 cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 
46021 ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 
46081 ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 
46141 ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 
46201 tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 
46261 gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 
46321 gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 
46381 cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 
4 6441 ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 
46501 cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 
46561 aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 
4 6621 gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 
46681 cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 
46741 cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 
46801 gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 
46861 cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 
46921 cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 
46981 catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 
47041 ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 
47101 acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 
47161 ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 
47221 gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 
47281 ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 
47341 gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 
47401 catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 
47461 accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 
47521 acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 
47581 ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 
47641 ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 
4 7701 acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 
47761 ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 
47821 tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 
47881 ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 
47941 cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 

(SEQ ID NO: 1) 
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TDP-L-megosamine 
(-rhodosamine) 



FIGURE 10 
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SEQUENCE LISTING 

<110> Kosan Biosciences, Inc. 

<120> Recombinant Megalomicin Biosynthetic 
Genes and Uses Thereof 

<130> 300622004740 

<140> To be assigned' 
<141> Herewith 

<150> US 60/158,305 
<151> 1999-10-08 

<150> US 60/190,024 
<151> 2000-03-17 

<160> 34 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 47981 
<212> DNA 

<213> Micromonospora megalomicea 

<220> 

<221> CDS 

<222> (1) . . . (144) 

<223> megBVI (megT) , TDP-4-keto-6-deoxyglucose-2 , 3-dehydratase; 
SEQ ID NO: 2= translated amino acid sequence 

<221> CDS 

<222> (928) . . . (2061) 

<223> megDVI, TDP-4-keto-6-deoxyglucose 3, 4-isomerase, 
TDP-4-keto-6-deoxyhexose 3, 4-isomerase; 
SEQ ID NO: 3= translated amino acid sequence 

<221> CDS 

<222> (2072) . . . (3382) 

<223> megDI, rhodosaminyl transferase (eryCIII homolog) , 
TDP-megosamine glycosyl transferase; 
SEQ ID NO: 4= translated amino acid sequence 

<221> CDS 

<222> (3462) . . . (4634) 

<223> megG(megY), mycarosyl acyltransf erase, mycarose O-acyltransf erase; 
SEQ ID NO: 5= translated amino acid sequence 

<221> CDS 

<222> (4651) . . . (5775) 

<223> megDII, deoxysugar transaminase (eryCI, DnrJ homolog), 
TDP-3-keto-6-deoxyhexose 3-aminotransaminase ; 
SEQ ID NO: 6= translated amino acid sequence 

<221> CDS 

<222> (5822) . . . (6595) 

<223> megDIII, daunosaminyl-N, N-dimethyltransf erase (eryCVI homolog); 
SEQ ID NO: 7= translated amino acid sequence 
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<221> CDS 

<222> (6592) . . . (7197) 

<223> megDIV, TDP-4-keto-6-deoxyglucose 3, 5-epimerase (eryBVII, dnmU 
homolog), TDP-4-keto-6-deoxyhexose 3, 5-epimerase; 
SEQ ID NO: 8= translated amino acid sequence 

<221> CDS 

<222> (7220) . . . (8206) 

<223> megDV, TDP-hexose 4-ketoreductase (eryBIV, dnmV homolog), 
TDP-4-keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO NO: 9= translated amino acid sequence 

<221> CDS 

<222> (8228) . . . (9220) 

<223> megBII-1 (megDVII) , TDP-4-keto-L-6-deoxy-hexose 2, 3-reductase; 
SEQ ID NO: 10= translated amino acid sequence 

<221> CDS 

<222> (9226) . . . (10479) 

<223> megBV, mycarosyl transferase, mycarose glycosyltransf erase; 
SEQ ID NO: 11= translated amino acid sequence 

<221> CDS 

<222> (10483) . . . (11424) 

<223> megBIV, TDP-hexose 4-keotreductase, 

TDP-4-keto-6-deoxyhexose 4-ketoreductase; 
SEQ ID NO: 12= translated amino acid sequence 

<221> CDS 

<222> (12181) . . . (22821) 

<223> megAI; SEQ ID NO: 13= translated amino acid sequence 

<221> misc_feature 
<222> (12505) . . . (13470) 
<223> megAI, AT-L 

<221> misc_feature 
<222> (13576) . . . (13791) 
<223> megAI, ACP-L 

<221> misc_feature 
<222> (13849) . . . (15126) 
<223> megAI, KS1 

<221> misc__feature 
<222> (15427) . . . (16476) 
<223> megAI, ATI 

<221> misc_feature 
<222> (17155) . . . (17694) 
<223> megAI, KR1 

<221> misc_feature 
<222> (17947) . . . (18207) 
<223> megAI, ACPI 

<221> misc_feature 
<222> (18268) . . - (19548) 
<223> megAI, KS2 

<221> misc feature 
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<222> (19876) . . . (20910) 
<223> megAI, AT 2 

<221> misc_feature 
<222> (21517) . . . (22053) 
<223> megAI, KR2 

<221> misc_feature 
<222> (22318) . . . (22575) 
<223> megAI, ACP2 

<221> CDS 

<222> (22867) . . . (33555) 

<223> megAII; SEQ ID NO: 14= translated amino acid sequence 

<221> misc_feature 
<222> (22957) . . . (24237) 
<223> megAII, KS3 

<221> misc_feature 
<222> (24544) . . . (25581) 
<223> megAII, AT 3 

<221> misc_feature 
<222> (26230) . . . (26733) 
<223> megAII, KR3 (inactive) 

<221> misc_feature 
<222> (26998) . . . (27258) 
<223> megAII, ACP3 

<221> miscjfeature 
<222> (27393) . . . (28590) 
<223> megAII, KS4 

<221> miscjfeature 
<222> (28897) . . . (29931) 
<223> megAII, AT 4 

<221> miscjfeature 
<222> (29953) . . . (30477) 
<223> megAII, DH4 

<221> misc_feature 
<222> (31396) . . . (32244) 
<223> megAII, ER4 

<221> misc__f eature 
<222> (32257) . . . (32799) 
<223> megAII, KR4 

<221> miscjfeature 
<222> (33052) . . . (33312) 
<223> megAII, ACP4 

<221> CDS 

<222> (33666) . . . (43271) 

<223> megAIII; SEQ ID NO: 15= translated amino acid sequence 

<221> ittisc_f eature 
<222> (33780) . . . (35027) 
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<223> megAIII, KS5 

<221> misc_feature 
<222> (35385) . . . (36419) 
<223> megAIII, ATS 

<221> misc_feature 
<222> (37068) . . . (37604) 
<223> megAIII, KR5 

<221> misc_feature 
<222> (37860) . . . (38120) 
<223> megAIII, ACP5 

<221> misc_feature 
<222> (38187) . . . (39470) 
<223> megAIII , KS6 

<221> misc_feature 
<222> (39795) . . . (40811) 
<223> megAIII , AT 6 

<221> misc_feature 
<222> (41406) . . . (41936) 
<223> megAIII, KR6 

<221> misc_feature 
<222> (42168) . . . (42425) 
<223> megAIII, ACP6 

<221> misc_feature 
<222> (42585) . . . (43271) 
<223> megAIII, TE 

<221> CDS 

<222> (43268) . . . (44344) 

<223> megCII, TDP-4-keto-6-deoxyglucose 3, 4 -isomerase; 
SEQ ID NO: 16= translated amino acid sequence 

<221> CDS 

<222> (44355) . . . (45623) 

<223> megCIII, desosaminyl transferase, desosamine glycosyltransferase; 
SEQ ID NO: 17= translated amino acid sequence 

<221> CDS 

<222> (45620) . . . (46591) 

<223> megBII-2 (megBII) , TDP-4-keto-6-deoxy-L-glucose 2,3 dehydratase, 
TDP-4-keto-6-deoxyglucose 2,3 dehydratase; 
SEQ ID NO: 18= translated amino acid sequence 

<221> CDS 

<222> (46660) . . . (47403) 

<223> megH, TEII; SEQ ID NO: 19= translated amino acid sequence 
<221> CDS 

<222> (47411) . . . (47980) 

<223> megF, C-6 hydroxylase; SEQ ID NO: 20= translated amino acid sequence 
<400> 1 

ctcgagecga tgctcggcgg cgcggtgggc caaccagtcg tggacgtcgt cggtggcggt 60- 
gggaggtccg ccgtgccgag tcaggaaacg tattgccgat tgtgtggatt ccggagtcgc 120 
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atgaccgttg acccgatccc ccatacgcct ctcccgtgat gtcgtgggcg gtccgtgcgg 180 

taccgcccgg actgacattc gtcgatcaag accccgccca gtgtagggct ccgcccgcga 240 

cgggagaagg tccgtcgaac aacttccggg tgaccggtcg ccggcgtcgg tgaaacgggc 300 

gtcggagcac ccgatcattg ctgtcggtga acttcctaac tgtcggcgcg cacatctttc 360 

tgaccggtgt gttccgtggt atgacgcgtt cccggcccgt ctggaactgt gcgtgggact 420 

gaccggttgc ggcgtgtttt cgcccgtttc cgaactgcgg attcgtcgat cgcgcaggtg 480 

ggagcgggtg gctgaccggg atgatctgca atcatggcgc tcaatgacga tctcttgtag 540 

catggtccgc gccgagggtc cgacaggccc gaaacgcccg gcatccagcc tgttcgacga 600 

cgtcgacatc accgtgcaag ccgcgatgac accgacacca cgccatgctg gtgccgcact 660 

ggaagggtgg cgcgatcagg gaaatggccg tgtcactaga cagacgccaa acagctgtcc 720 

gggcctgcgg aaacagcatc gatctgcgtc agccgttcat tgccccggcg gcaccgcctt 780 

ggaaatccgt gccaccggtc gtccgcagtg acgatcgcgg acccgggttt cgagacagca 84 0 

ggtagtaggc gatgcaggcg tttcgtctcg cgccggacgc gtcgcactag gtggaatccg 900 

tcacagtctt caatccggga gcgttctatg gcagttggcg atcgaaggcg gctgggccgg 960 

gagttgcaga tggcccgggg tctctactgg gggttcggtg ccaacggcga tctgtactcg 1020 

atgctcctgt ccggacggga cgacgacccc tggacctggt acgaacggtt gcgggccgcc 1080 

ggacggggac cgtacgccag tcgggccgga acgtgggtgg tcggtgacca ccggaccgcc 1140 

gccgaggtgc tcgccgatcc gggcttcacc cacggcccgc ccgacgctgc ccggtggatg 1200 

caggtggccc actgcccggc ggcctcctgg gccggcccct tccgggagtt ctacgcccgc 1260 

accgaggacg cggcgtcggt gacagtggac gccgactggc tccagcagcg gtgcgccagg- 1320 

ctggtgaccg agctggggtc gcgcttcgat ctcgtgaacg acttcgcccg ggaggtcccg 1380 

gtgctggcgc tcggtaccgc gcccgcactc aagggcgtgg accccgaccg tctccggtcc 1440 

tggacctcgg cgacccgggt atgcctggac gcccaggtca gcccgcaaca gctcgcggtg 1500 

accgaacagg cgctgaccgc cctcgacgag atcgacgcgg tcaccggcgg tcgggacgcc 1560 

gcggtgctgg tgggggtggt ggcggagctg gcggccaaca cggtgggcaa cgccgtcctg 1620 

gccgtcaccg agcttcccga actggcggca cgacttgccg acgacccgga gaccgcgacc 1680 

cgtgtggtga cggaggtgtc gcggacgagt cccggcgtcc acctggaacg ccgcaccgcc 1740 

gcgtcggacc gccgggtggg cggggtcgac gtcccgaccg gtggcgaggt gacagtggtc 1800 

gtcgccgcgg cgaaccgtga tcccgaggtc ttcaccgatc ccgaccggtt cgacgtggac 1860 

cgtggcggcg acgccgagat cctgtcgtcc cggcccggct cgccccgcac cgacctcgac 1920 

gccctggtgg ccaccctggc cacggcggcg ctgcgggccg ccgcgccggt gttgccccgg 1980 

ctgtcccgtt "ccgggccggt gatcagacga cgtcggtcac ccgtcgcccg tggtctcagc 2040 

cgttgcccgg tcgagctgta gaggaagaac gatgcgcgtc gtgttttcat cgatggctgt 2100 

caacagccat ctgttcgggc tggtcccgct cgcaagcgcc ttccaggcgg ccggacacga 2160 

ggtacgggtc gtcgcctcgc cggccctgac cgacgacgtc accggtgccg gtctgaccgc 2220 

cgtgcccgtc ggtgacgacg tggaacttgt ggagtggcac gcccacgcgg gccaggacat 2280 

cgtcgagtac atgcggaccc tcgactgggt cgaccagagc cacaccacca tgtcctggga 2340 

cgacctcctg ggcatgcaga ccaccttcac cccgaccttc ttcgccctga tgagccccga 2400 

ctcgctcatc gacgggatgg tcgagttctg ccgctcctgg cgtcccgact ggatcgtctg 2460 

ggagccgctg accttcgccg ccccgatcgc ggcccgggtc accggaaccc cgcacgcccg 2520 

gatgctgtgg ggtccggacg tcgccacccg ggcccggcag agcttcctgc gactgctggc 2580 

ccaccaggag gtggagcacc gggaggatcc gctggccgag tggttcgact ggacgctgcg 264 0 

gcgcttcggc gacgacccgc acctgagctt cgacgaggaa ctggtgctgg ggcagtggac 2700 

cgtggacccc atccccgagc cgctgcggat cgacaccggc gtccggacgg tgggcatgcg 2760 

gtacgtcccc tacaacggcc cctcggtggt gcccgcctgg ctgttgcggg aacccgaacg 2820 

tcggcgggtc tgcctgaccc tcggcggttc cagccgggaa cacggcatcg ggcaggtctc 2880 

catcggcgag atgttggacg ccatcgccga catcgacgcc gagttcgtgg ccaccttcga 2940 

cgaccagcag ttggtcggcg tgggcagcgt tccggcaaac gtccgtaccg ccgggttcgt 3000 

gccgatgaac gtcctgctgc ccacctqcgc ggccaccgtg caccacggcg gcaccggcag 3060 

ttggctgacc gccgccatcc acggcgtacc gcagatcatc ctctcggacg ccgacaccga 3120 

ggtgcacgcc aagcagctcc aggacctcgg cgcggggctg tcgctcccgg tcgcggggat 3180 

gaccgccgag cacctgcgtg gggcgatcga gcgggttctc gacgagccgg cgtaccgcct 324 0 

cggtgcggag cggatgcggg acgggatgcg gaccgacccg tcgccggccc aggtggtcgg 3300 

catctgtcag gacctggccg ccgaccgggc ggcacgcggc aggcagccgc gtcgaaccgc 3360 

cgagccgcac ctgccgcgat gacttccacc accaccggga ccggctgatg ccggtcccgg 3420 

aatccacacg ccgactttcc ttctgacacg agggggcccc ggtggttacc tccaccaact 3480 

tggacacgac agcacggccg gcactgaact cgttgaccgg gatgcggttc gtcgccgcct 3540 

tcctggtctt cttcacgcac gtcctgtcga ggctcatccc gaacagctac gtgtacgccg 3600 

acggcctgga cgccttctgg cagaccaccg gacgggtggg ggtgtcgttc ttctttattc 3660 

tcagcggttt cgtgctgacc tggtcggcgc gggccagcga ctcggtgtgg tcgttctggc -3720- 

gcagacgggt ctgcaagctc ttccccaacc acctggtcac cgccttcgcc gccgtggtgt 3780 
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tgttcctggt caccgggcag gcggtgagcg gtgaggcgct gatcccgaac ctcctgctga 384 0 

tccacgcctg gttcccggcc ctggagatct ccttcggcat caacccggtg agctggtcgt 3900 

tggcctgcga ggcgttcttc tacctgtgct tcccgctgtt cctgttctgg atctccggta 3960 

tccgcccgga gcggctgtgg gcctgggccg ccgtggtgtt cgccgcgatc tgggcggtac 4020 

cggtggtcgc cgacctcctg ctgccgagtt ccccgccgct gatcccgggg cttgagtact 4080 

ccgccatcca ggactggttc ctctacacct tccctgcgac gcggagcctg gagttcatcc 4140 

tcgggatcat cctggcccgc atcctgatca ccggtcggtg gatcaacgtc gggctgctcc 4200 

ccgcggtgct gttgttcccg gtcttcttcg tcgcctcgct cttcctgccg ggtgtctacg 4260 

ccatctcctc gtcgatgatg atccttcccc tggttctgat catcgccagc ggcgcgacgg 4320 

ccgacctcca gcagaagcgc accttcatgc gtaaccgggt gatggtgtgg ctcggcgacg 4380 

tctccttcgc gctctacatg gtccacttcc tggtgatcgt ctacggggcg gacctgctgg 4440 

ggttcagcca gaccgaggac gccccgctgg gtctcgcact cttcatgatc attccgttcc 4500 

tcgcggtctc cctggtgctg tcgtggctgc tgtacaggtt cgtcgagcta cccgtcatgc 4560 

gtaactgggc ccgcccggcc tccgcccggc gcaaacccgc cacggaaccc gaacagaccc 4 620 

cttcccgccg gtaagaagga cggtgcatcg gtgaccacct acgtctggtc ctatctgttg 4 680 

gagtacgaga gggaacgagc cgacatcctc gatgcggtgc agaaggtctt cgccagtggc 4740 

agcctgatcc tcggtcagag tgtggagaac ttcgagaccg agtacgcccg ctaccacggg 4800 

atcgcgcact gcgtgggcgt cgacaacggc accaacgctg tgaaactcgc gctggagtcg 4860 

gtaggtgtcg gacgcgacga cgaggtcgtc acggtctcca acaccgccgc ccccacagtc 4 920 

ctggccatcg acgagatcgg cgcccggccg gtcttcgtgg acgtccgcga cgaggactac 4 980 

ctcatggaca ccgacctggt ggaggcggcg gtcaccccgc gtaccaaggc catcgtcccg 5040 

gtgcacctgt acgggcagtg cgtggacatg acagccctgc gggaactggc cgaccggcgg 5100 

ggcctcaagc tcgtggagga ctgcgcccag gcccacggtg cccggcggga cggtcggctg 5160 

gccgggacga tgagcgacgc ggcggccttc tcgttctacc cgacgaaggt cctcggcgcc 5220 

tacggcgacg gcggcgcggt cgtcaccaac gacgacgaga cagcccgcgc cctgcgacgg 5280 

ctgcggtact acgggatgga ggaggtctac tacgtcaccc ggaccccggg tcacaacagc 5340 

cgcctcgacg aggtgcaggc cgagatcctg cggcgcaaac tgacccggct cgacgcgtac 54 00 

gtcgcgggtc ggcgggcggt cgcccagcgg tacgtcgacg ggctcgccga cctccaagac 54 60 

tcgcacggcc tcgaactccc agtggtcacc gacggcaacg aacacgtctt ctacgtgtac 5520 

gtcgtccgcc acccgcgccg cgacgagatc atcaagcgtc tccgggacgg gtacgacatc 5580 

tccctgaaca tcagctaccc ctggccggtg cacaccatga ccggcttcgc ccacctcggt 5640 

gtcgcgtcgg 'ggtcgctgcc ggtcaccgaa cggctggccg gcgagatctt ctcccttccc 5700 

atgtacccct ccctccctca cgacctgcag gacagggtga tcgaggcggt gcgggaggtc 57 60 

atcaccgggc tgtgacgagc ccgcgtgtcg tcagcgaaga cccactctgg aagggccggt 5820 

catgccgaac agccactcga ccacgtcgag caccgacgtc gccccgtacg agcgggcgga 5880 

catctaccac gacttctacc acggccgtgg caagggatac cgtgccgaag ccgacgcgct 5940 

cgtggaggtc gcccgcaagc acaccccaca ggcggcgacc ctgctggacg tggcctgcgg 6000 

gaccggatcc cacctggtcg agctggcgga cagcttccgg gaggtggtgg gggtcgacct 6060 

gtcggccgcc atgctcgcca ccgccgcccg caacgacccc gggcgggaac tgcaccaggg 6120 

cgacatgcgc gacttctccc tcgaccgcag gttcgacgtc gtcacctgca tgttcagctc 6180 

caccggttac ctcgtcgacg aggccgaact ggaccgtgcc gtggcgaacc tggccggtca 624 0 

cctcgcgcct ggcggcaccc tcgtcgtgga gccctggtgg ttcccggaga cgttccggcc 6300 

cggctgggtc ggggccgacc tggtcaccag cggtgaccgg aggatctccc ggatgtcgca 6360 

caccgtcccg gcgggtctgc ccgaccgcac cgcctcccgg atgaccatcc actacacggt 6420 

ggggtcaccg gaggccggga tcgagcactt caccgaggtg cacgtgatga ccctgttcgc 64 80 

ccgcgccgcc tacgagcagg ccttccagcg ggcgggcctg agctgctcgt acgtcggcca 6540 

cgacctgttc tcgccgggcc ttttcgtcgg ggtcgccgcg gagccggggc ggtgagggtc 6600 

gaggagctgg gcatcgaggg ggtcttcacc ttcaccccgc agacgttcgc cgacgagcgg 6660 

ggggtgttcg gcacggcgta ccaggaggac gtgttcgtgg cggcgctcgg ccgcccgctg 6720 

ttcccggtgg cccaggtcag caccacccgg tcccggcggg gtgtggtccg gggggtgcac 6780 

ttcacgacga tgcccggctc catggcgaag tacgtctact gcgccagggg tagggcgatg 6840 

gacttcgccg tcgacatccg gcccggttcc ccgaccttcg gccgggccga gccggtcgag 6900 

ctctccgccg agtcgatggt cgggctgtac cttcccgtgg gcatgggcca cctgttcgtc 6960 

tccctggagg acgacaccac cctcgtctac ctgatgtccg ccggttacgt ccccgacaag 7020 

gaacgggcgg tgcaccccct ggatccggag ctggcgttgc cgatcccggc cgacctcgac 7080 

ctcgtcatgt ccgagcggga ccgggtcgca cccaccctcc gggaggcccg ggaccagggg 7140 

atcctgcccg actacgccgc ctgccgggcc gccgcgcacc gggtggtgcg gacgtgaccc 7200 

cggccgggcg tgcgggccgg tggtggtgct cggcgcgtcg ggtttcctgg gttcggcggt 7260 

cacccacgcc ctggccgacc tcccggtgcg ggtgcggctc gtcgcccggc gggaggtcgt 7320 

cgtgccctcc ggtgccgtcg ccgactacga gacgcaccgg gtggacctca ccgaacccgg -p380- 

agcgctcgcg gaggtggtcg cggacgcccg ggcggtcttc ccgttcgccg cccagatcag 7440 
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gggtacgtca 
cctggtccgg 
cccgggcagc 
gcaggaccac 
ggaggccact 
ggtgcccgcc 
cctgaccggc 
cgtgaccgac 
acgccacttc 
ctcgcgcagc 
tccggcgcac 
ggctgtcacc 
ggcgttggcc 
gttcgtccta 
ggagttcctc 
gtgtgcgggg 
cgacctcggc 
ggcgtacggc 
ctgccaactt 
cgcccacccc 
ggttgtacag 
gggcggcggc 
tgccgaccag 
ggtgcgtctg 
cggcgacgat 
ccaccttggt 
cgacgagttc 
tgcagttgac 
cccgtccact 
cgacgcgtac 
gtgctggtgg 
cgcctcctgc 
gagagcctgc 
cagctcggcg 
cggtacgccg 
ggcacagccc 
caccgacgcc 
ggtggccagt 
cccggtgaag 
ggacccgttg 
gctctcgggc 
gccgaaccgg 
gggacgctgc 
ccacagcagc 
gaagggctcc 
gacgaaggag 
gtgcaggaac 
gtagcggtgc 
gagtggcacc 
cacccggacg 
gtgggtacgg 
gagggcggca 
cagcgtccgg 
ctccggtgga 
ggcggcgacg 
gacgtcgccg 
ctcgacgtgc 
ggcgagggct 
gctgtggccg 
ggcctgacgc 
ggggttcgcg 



gggtggcgga 
gacctgatcg 
aacacgcagg 
cccgagggcg 
gcggccgggg 
gccggcaccg 
caaccgctga 
gccgcccggg 
ctgttgggga 
gtcgcccggc 
atggacccgt 
gggtggcggg 
ccccgccggg 
cggcaccggc 
ctcgcccagc 
gccgatgaca 
cgggtccgcg 
ggggaggagc 
ctccagtacg 
gtacgcctgg 
gcactggtgg 
gatgtgccag 
atgttcggcg 
gtagatgtcg 
gtgtcgggcg 
cgccaggacg 
ctcggtgtgg 
gccccgctcg 
gaagttcacg 
ccgggcggac 
gcgagcgcct 
'cgcagcttct 
cagagggtgt 
gtgcgctgac 
tggtgcagcg 
ggcagcagga 
ggatcgagcc 
gtccggagga 
cagacccggc 
tagggcaaag 
agctggtcga 
ccggcgacct 
ccgcgcaggt 
cgggcgtggg 
cagagcacca 
tcgttgttga 
tcccacgagc 
acctgcgcgg 
gaggtcagtc 
tcgtggccgg 
tgcgcgaacg 
acggtccggt 
aactcggtgg 
gggacgctga 
gtctcgaaga 
accagcgcct 
aggaggttgc 
cgccggatca 
tagatcgcgg 
aggatccgct 
gcctgggtgg 



tcagcgagga 
ccgtcctgtc 
tcggcagggt 
tctacgacag 
cgatccgggc 
ccgacgaccg 
cgatgtggca 
ccttcgtcac 
cggggcgttc 
acaccggcga 
cggacctgcg 
ccacggtcac 
ccgccgcccc 
ccgtcgacgg 
gtcagctcgg 
gcgcccagga 
ccgaggcgtc 
acctgggcgc 
ccgctgagca 
gcggcgggca 
gagatcatgc 
cccgccaggt 
gcctgccaca 
atgtggtcga 
gagagcccgc 
gtctcctcgc 
cccttgtaga 
agggcgtggt 
gtgccgagcc 
ccggccccgg 
ccagcacggg 
cggcgttctc 
cggcgtcgac 
cacgcaggac 
cggtggccca 
tgttcatggg 
cggagcgggt 
actcctgcgg 
ggactccgtc 
tccgggtgtg 
cgctccactg 
cggtgagcca 
cctgggagcg 
cggccccgca 
ggtcgggacg 
ccaccgggaa 
gcagttccgg 
cggcctcagg 
ccgcgccgac 
cggtgtgcag 
aggtgagcag 
cgatgccctc 
agtcgaagtc 
cgacgggcac 
tctcgccgag 
cgtggttgtg 
ggcgcacgct 
tggcggtgac 
gcaggcgcag 
cggcctcgat 
tgctggcgaa 



cgacgtggtc 
ccgctcgccg 
caccgccggc 
gcagaaacac 
gaccagtctg 
gggggtggtc 
cgacggcacc 
cgccctggac 
ctggccgctg 
ggacccggtg 
cagcgtggag 
gatggcggag 
gtccgagccc 
ccggtgccgg 
cggcccgtaa 
tcccggggcg 
ggcagtagtc 
gtccctgcgc 
gcccgccgtg 
ggacgtccag 
cgagcaggtt 
tggaggagcc 
cctcgtccca 
ccccgaggcg 
cgtcgttgac 
gtcgacctcc 
gccgccagcc 
ccatcagccg 
agagtcggct 
tggttcccac 
tacgacctcg 
ggcgtgggaa 
ctcgtccgga 
acagtcccac 
gcttccggca 
aacgaagtcc 
caccacgatc 
gttcgaggtg 
cgaggtcctg 
caccgactcc 
tccgacagcg 
gccgccgagc 
gctgcggaag 
ggccttggcc 
ccagtccatg 
gacgaaccgg 
tccgcgtcgg 
ggagatgtcg 
gacgacgtcg 
cgcccaggcc 
gacccgcact 
ggccagcggc 
gtcgctgcgg 
cgcagggttg 
gggtcgggcc 
cagtgcggcg 
gccctcgtgc 
gacaccccgg 
gatcaccccg 
cttgtgctgg 
caggagcacc 



gccgaacgga 
cacgccccgg 
cgggtcatcg 
accggggaac 
cggctgcccc 
tccaccatga 
gtccggcgtg 
cacgccgacg 
ggcgaggtct 
ccggtggtct 
gtcgaccccg 
gcggtcgacc 
tcctgaccgg 
gaagatcgct 
cgccgagtcg 
ggacaggacc 
ctcgtacgcc 
cgacttgacg 
caggggggac 
ctcggggtgg 
gcggcgtgcc 
gacgtacccg 
cggtgcggcg 
gcggagggag 
ccgttcgctc 
gccctgggcg 
gtagatgtcg 
cagcgcgtcg 
ggtgtgcaac 
gtcggtcacc 
gcgggggtcg 
cggtcctcga 
cggaggaaga 
tcgtgggcga 
ccgccgtggt 
accaggcgga 
tcgccgtcga 
atgcccagcg 
agccactgcg 
agtccggtct 
aggtcctcgc 
gggtccggcc 
tagccggtga 
gcgaccgccc 
gcgaactcga 
gaggtggcct 
gcgaagtcca 
aagagtcggt 
gtgagctcgg 

agggggacga 

ggtcactcct 
acccgggggt 
aagtcgttgg 
ccggtctgac 
tcgtccgcgc 
gtgaacgcgg 
cacatcgtga 
ccggtctgcc 
tcgacgaccc 
gcgtaccggc 
ggcgcgggtc 



cgaacgtcgg 
tggtggtctt 
acggcagcga 
agctgctcaa 
cggtgttcgg 
tccgtcgggc 
aactgctgta 
cgctcgccgg 
tccaggcggt 
cggtgccgcc 
cccggttcac 
ggacggtggc 
ggtcacccgg 
tcgagttccc 
agctgctcgg 
caggccagac 
tcgacgaggg 
gcggttccgg 
caggcgaaca 
cggacggcca 
gcgctctcct 
accttcccac 
cggtcgatgt 
ttctcgcagg 
atctcgctgc 
aaccaccgtc 
gcggtgtcga 
tcgtcggtca 
gccgatcgtc 
tgtcggcgcg 
gcgcggccag 
ccactgtggc 
cacccgctcc 
cggagatctg 
ggatgacggc 
cgttgtccgg 
accgcgcgag 
ccgagtatcc 
gcacgacgga 
ccaggcggaa 
tgtagtcgag 
ggtcgtcggc 
ggtcgctgcc 
cggcgaaggt 
cgagttcgtc 
cctcgatgcc 
ggtcggtggt 
ggtccgagcc 
gctgactggc 
ggccctggaa 
tggtcgagat 
gccagccggt 
cctcggcgtt 
gtgccacgct 
tcggcgtcca 
tggccacgtc 
tcggctcacc 
ccgacgggcc 
cgtcctcggt 
tgggggcggc 
cgggtcttgc 



7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 
8340 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
H040- 

nioo 
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ccgcagcgcg gcgacgaggt cgcgcatgat gcccgcgttg acgcgttcgg cctcgggcac 11160 

cgtggcggcg ctgcgccagg tcgacccgcc ggcggcgtag gcgaccagat gcacgacgac 11220 

gtcggtgtcg gcgacgacct gcgcgacccg gccgggttcg agcaggtcga ctcgaaggtg 11280 

ctcgatcccg gcgctgcctg gtggctggtc gcgagacccg gtgcgcgcga cggcccgcag 11340 

tcggagaggg tgtgtggtaa attcgcgaag aagggcgctt ccgacgaatc cagaaacgcc 11400 

gagaagtgtg acatgtcttg tcatctacta atgcattccg atagccaccg gcgcatggaa 11460 

tccatttgtt ccccccaggg tggtgtcggg tgacaaatcc ggcctcaggt cggcctcaag 11520 

cctctttcga gcgggtgctg aggcttcccg cgtaccctcg gtggcctgcg ttcgggcggg 11580 

tgtcggggaa agggcggatc gaggagttcg gtagggcgtc gcggcgcgta ctccgggact 11640 

gatccgggtc gacgccccga cgcgtgacag ggcgtcgatc cgtgccgccc gtaccgccgg 11700 

ttttcggcga tggtcgcaga ttcctcccga cgtggtggac tcattggttc tcccgggtgt 11760 

ggccgcaccg tcggtggcct cgtcgggggt gtcggagacc gggtcgatcg ccgtccccgg 11820 

ccgtgccgac cagggtcggt ccgtcgccga ggtgggtcac cgtcgggtgg acccggtccg 11880 

ccggcggcca ccgcccgatc gtgcccacct tcgcctccgc gggtaaatgc ttcgtcgatc 11940 

tgatcgacac ttccggcgac gctatcaccg gagcattccc cggcaccacc ggtcgatgcc 12000 

tcgcgctttc caaacaggga aaacagcagc tcacagcggt tccaggcgcc gggcaatcct 12060 

agcgaagagt ctcgatgggg tcaaggtgaa ttctgtcaca gatgtttttg ttaaatgtac 12120 

tttcttcagc caccctcgac gttcatacaa ttggccggca tctctaccaa gggggagtga 12180 

gtggttgacg tgcccgatct actcggcacc cggactccgc acccagggcc gctcccattc 12240 

ccgtggcccc tgtgcggtca caacgaaccg gagctgcggg cccgcgcccg tcaattgcac 12300 

gcatatctcg aaggcatttc cgaggatgac gtggtggccg tcggcgccgc cctcgcgcgc 12360 

gagacacgcg cgcaggacgg gccgcaccgc gccgtcgtcg tggcctcctc ggtcaccgag 12420 

ctgaccgccg cgctcgccgc cctcgcccag ggccgcccac acccctcggt ggtacgcggt 12480 

gtcgcccgac ccacggcacc ggtggtgttc gtcctgcccg gtcagggcgc ccagtggccc 12540 

ggcatggcga cccgactgct cgccgagtcg cccgtcttcg ccgcggcgat gcgggcctgc 12600 

gagcgggcct tcgacgaggt caccgactgg tcgttgaccg aggtcctgga ctcacccgag 12660 

cacctgcgcc gcgtcgaggt ggtccagccc gcgctcttcg cggtgcagac ctcactggcc 12720 

gccctgtggc ggtcgttcgg ggtgcgaccc gacgccgtac tcggacacag catcggtgag 12780 

ctggccgccg ccgaggtctg cggcgccgtc gacgtcgagg ccgccgcgcg ggccgccgcc 1284 0 

ctgtggagcc gcgagatggt cccactggtg ggccggggtg acatggcggc ggtggcgctc 12900 

tccccggccg agctggcagc ccgggtcgag cggtgggacg acgacgtcgt gccggccggg 12 960 

gtcaacggtc 'cccggtcggt gctgctcacc ggcgctcccg agcccatcgc acggcgggtc 13020 

gccgagctgg cggcacaggg cgtacgcgcc caggtcgtca acgtgtcgat ggcggcgcac 13080 

tcggcgcagg tcgacgccgt cgccgagggc atgcgctcgg cgctgacctg gttcgccccc 13140 

ggcgactccg acgtgcccta ctacgccggc ctcaccggcg ggcggctgga cacccgggaa 13200 

ctcggcgccg accactggcc gcgcagtttc cggctcccgg tgcgcttcga cgaggcgacc 13260 

cgtgcggtcc tggaactgca gcccggcacg ttcatcgagt cgagcccgca cccggtgctg 13320 

gcggcctccc tgcagcagac cctcgacgag gtcgggtccc cggccgcgat cgtgccgacc 13380 

ctgcaacgcg accagggcgg tctgcggcgg ttcctgctcg ccgtggcgca ggcgtacacc 13440 

ggtggcgtga cagtcgactg gaccgccgcc taccccgggg tgacccccgg ccacctgccg 13500 

tcggccgtcg ccgtcgagac cgacgaggga ccctcgacgg agttcgactg ggccgcgccc 13560 

gaccacgtac tgcgcgcgcg gctgctggag atcgtcggcg ccgagacggc cgcgctcgcc 13620 

gggcgggagg tcgacgcccg ggccaccttc cgggaactgg gcctcgactc ggtcctcgcg 13680 

gtgcagctgc ggacccgcct cgccacggcg accgggcggg atctgcacat cgccatgctc 13740 

tacgaccacc cgaccccgca cgccctcacc gaggcgctgc tgcgcggccc gcaggaggag 13800 

ccggggcggg gtgaggagac ggcacacccg acggaggccg aacccgacga acccgtcgcc 13860 

gtggtcgcca tggcgtgccg gctgcccggc ggcgtcacct caccggagga gttctgggag 13920 

ctgctggccg aggggcggga cgccgtcggc gggctgccca ccgaccgggg atgggacctg 13980 

gactcgctgt tccacccgga cccgacccgg tcgggcacgg cgcaccagcg cgctggtggc 14040 

ttcctcaccg gcgccacctc cttcgacgct gccttcttcg ggctgtcgcc acgggaggca 14100 

ctggccgtcg agccgcagca gcggatcacg ttggagctgt cgtgggaggt gctggaacgc 14160 

gccgggatcc ccccgacgtc gttgcggacc tcccggaccg gggtgttcgt cggtctgatc 14220 

ccccaggagt acggcccccg gctggccgag gggggtgagg gcgtcgaggg ctacctgatg 14280 

accgggacca ccaccagcgt cgcctccggt cgggtcgcct acaccctcgg cctggagggg 14340 

ccggcgatca gcgtcgacac cgcctgctcg tcgtcgctcg tcgccgtgca cctggcgtgc 14400 

cagtcgctgc ggcgcggcga gtcgacgatg gcgctcgccg gtggcgtgac ggtgatgccg 144 60 

acaccgggca tgctcgtgga cttcagtcgg atgaactccc tcgcccccga cggacggtcc 14 520 

aaggcgttct cggccgccgc cgacgggttc ggcatggccg aaggcgcagg gatgctcctg 14 580 

ctggaacggc tctcggacgc ccgccgccac ggccacccgg tgctcgccgt gatcaggggc 14 64 0 

accgctgtca actccgacgg cgcgagcaac ggactctccg ccccgaacgg ccgggcccag 1^4700* 

gtccgggtga tccgacaggc cctcgccgag tccgggctga cgccccacac cgtcgacgtc 14760 
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gtggagaccc 
gacgcgtacg 
gggcacaccc 
gccggtgtcc 
tcgggcgcga 
cgggccgggg 
gcgccgccga 
ctctcggcga 
cgcgagcacc 
gcgctggcgt 
gacgaactcg 
cgcgtcgtct 
ctcgacggcg 
tacctggact 
cacacgctct 
tccctggcgg 
cagggggaga 
gcggtggccc 
atcgccgcct 
gtcaacggtc 
gcctcctgca 
tcctcgcacg 
ctgccgggct 
ctcgacgccg 
cgctccctcg 
accacggcga 
ctgcgacgtg 
gccggcgtcg 
ctgcccacgt 
gtcgccgact 
ccgggtgagc 
gacgaccggg 
ctggtggtgg 
ccggtggcgg 
ctggcggtga 
cgggagtgtc 
ctccgcgacc 
cccgccgtct 
cacctcggga 
acgtacgccc 
ggcacggtgc 
gcccgccagg 
gtcgaggagc 
gacgtcaccg 
ctgtcggcgg 
ggtgaccgca 
ctgacccggg 
ggcgcgccgg 
cagcgacgca 
gggatggccg 
cccgaccagg 
gtcgtcgaca 
ctcttcgaca 
gtggcggcgc 
cggacgcacg 
gccttcgccg 
actgcgaccg 
ctggccggac 
gaggccccga 
ctgccggggg 
accgcctcgg 



acggcaccgg 
gcggtgaccg 
aggccgccgc 
tgccccgcac 
tcagcctgct 
tgtcctcgtt 
ccggtgacga 
gcaccggcga 
ccgaccagga 
accgtagtgg 
ccgccggtgg 
tcgtcttccc 
acccggtctt 
tcgagatcgt 
ccaccgaccg 
cccggtggcg 
ttgccgcggc 
tgcgcagccg 
ccgtcgacga 
cgcgcgcggt 
ccgtcgaggg 
tcgaggccgt 
tcgtgccgtt 
ggtactggtt 
ccgaccaggg 
tcgaggagat 
gggccggcgg 
cagtggactg 
acccgttcca 
ccgacgacgt 
cgggacggct 
tcgaggcggc 
agccccggac 
gcgtgctctg 
cgtcgttgtc 
cgatctgggt 
cggcccacgg 
ggggcggcct 
cgaccctgtc 
gccggtggtg 
tcgtcaccgg 
gcaccccgtg 
tactcaccga 
accgggagca 
tgttccacgt 
tcgaacgggc 
acgccgacct 
ggctcggcgg 
gcgagggact 
agggtccggt 
ccgtcgaggg 
tcaggtggga 
ccctcgacga 
tggccgggct 
cggctgccgt 
aactcggcgt 
gggtccggct 
acctggccgc 
cggtggcccc 
gagtggactc 
cggcacccgg 



cacccgcctc 
tgagcacccg 
cggtgtcgcc 
cctgcacgcc 
ccaggagccc 
cggcatcagc 
cacccgaccc 
ggcgttgcgc 
cctggacgac 
gttcgtgccc 
atccggggac 
cggccaggga 
cgcctcggtg 
cccgttcctg 
cgtcgacgtg 
ggcgtacggg 
gtgtgtggcc 
ggtcatcgcc 
ggtggcggcc 
ggtggtctcc 
ggtgcgggcc 
ccgtgacgcg 
ctactcgaca 
tcgcaacctg 
gtacacgacg 
cggtgaggac 
tcccgtcgac 
ggagtcggcg 
gcgtgagcgc 
ctcgtccctg 
cgacggcacc 
gcggcaggcg 
gggccgggtc 
cctgttcgct 
ggacacgctc 
ggtcaccgag 
cgcgctctgg 
ggtcgacgtg 
cggcgccggc 
cagggcgggc 
cggcaccggc 
cctggtgctg 
actcgccgac 
gctccgtgcc 
cgccgcgacg 
caaccgggcg 
cgacgcgttc 
ctacgtcccg 
cccggccacc 
cgccgaccgg 
tctccgggtg 
ccggttcctc 
ggcccgtcgg 
gcccgtcggg 
cctcggccac 
cgactcgctg 
ggccacgacg 
cgaactgggc 
gaccgacgag 
accggagcag 
ggaccggagc 



ggtgatccga 
ctgcggatcg 
ggtctgatca 
gacgagccgt 
gctgcctggc 
ggcaccaacg 
gaccggatgg 
gcccgggcgg 
gtcgcctact 
gccgacgcgt 
gcggtgaccg 
tggcagtggg 
ctgcgggagt 
cgggccgagg 
gtccagccgg 
gtggaaccgg 
ggggcgctct 
accatgcccg 
cggatcgacg 
ggcgaccgtg 
aagcggctgc 
ctccacgccg 
gtcaccggcc 
cgccacaggg 
ttcctggagg 
cgtggcggtg 
ttcggctccg 
taccagggtg 
ttctggttgg 
cggtaccgca 
tggctgctgg 
ctggagtccg 
gacctggtgc 
gtcgcggagc 
gacctgaccc 
aacgccgtcg 
gccctcggtc 
ccgtcgggtt 
gaggaccagg 
gcgggcggca 
ggggtcggtc 
gccagccgcc 
ctgggcaccc 
ctcctcgcga 
ctcgacgacg 
aaggtgctcg 
gtgctcttct 
ggcaacgcct 
tcggtggcgt 
ttccgccggc 
gcactggtgc 
ctcgcgtaca 
gccgcgcccg 
gaacgcgaga 
gcctcggccg 
tcggccctgg 
acggtcttcg 
ggcggatcgg 
ccgatcgcca 
ctgtgggagt 
tgggatccgg 



tcgaggcacg 
gctcggtcaa 
aactggtgtt 
caccggagat 
ccgccggcga 
cacacgcgat 
gcccggtggt 
cgcggctggc 
cgctggccac 
ccacggcgct 
gcaccgcccg 
cggggatggc 
gcgccgacgc 
cgcagcgccg 
tgctgttcgc 
cggccgtcat 
cgctggacga 
gcaacggcgc 
ggcgggtcga 
acgacctgga 
cggtggacta 
aactcggcga 
gctgggtcga 
tccggttcgc 
tcagcgccca 
acctcgtcgc 
cgctggcccg 
ccggggcgcg 
aaccgaatcc 
tcgaatggca 
cgacgtaccc 
ccggggcgcg 
ggcggctcga 
cggcggccga 
aggcggtggc 
ccgtcgggcc 
gggtcgtcgc 
cggtcgccga 
tcgccctccg 
cgggccggtg 
ggcacgtcgc 
ggggaccgga 
gggccaccgt 
ccgtcgacga 
gcaccgtcga 
gtgcccgcaa 
cctcctccac 
acctcgacgg 
ggggtacctg 
acggggtcat 
agggtgaggt 
ccgcgcagcg 
gtcccgacgc 
aggcggtcct 
agcaggtgcc 
aactgcgcaa 
accacccgga 
ggcgggagcg 
tcgtcgggat 
tgatcgtctc 
cggagttgat 



ggcgctctcc 
gtccaacatc 
ggcgatgcag 
cgactggtcc 
gcggccccgc 
catcgaggag 
gccctgggtg 
cgggcaccta 
cggtcgggcc 
gcggatcctc 
cgccccgcag 
agtcgacctg 
gttggaaccg 
gacccccgac 
ggtgatggtg 
cggacactcc 
cgcggcccgg 
gatggcctcg 
gatcgccgcc 
ccgcctggtc 
cgcgtcgcac 
gttccggccg 
gcccgccgaa 
cgacgcggtc 
cccggtgctc 
tgtccactcg 
cgccttcgtg 
tcgggtgccg 
ggcccgcagg 
cccgaccgat 
cggtcgggcc 
ggtcgaggac 
cgccgtgggt 
acactccccg 
cgggtcgggc 
cttcgaacgg 
cctggagaac 
gctgtcgcgt 
acccgacggg 
gcagccccgg 
ccggtggctg 
cgccgacggg 
caccgcctgc 
cgagcacccg 
gaccctcacc 
cctgcacgag 
cgccgcgttc 
tctcgcccag 
ggcgggcagc 
ggagatgcac 
agccccgatc 
ccccacccgg 
cgggccgggg 
cgacctggta 
cgtcgacagg 
ccggctgacc 
cgtacggacc 
gcccgggggc 
ggcctgccgg 
cgggcgggac 
ggtctccgac 



14820 

14880 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360' 

15420 

15480 

15540 

15600 

15660 

15720 

15780 

15840 

15900 

15960 

16020 

16080 

16140 

16200 

16260 

16320 

16380 

16440 

16500 

16560 

16620 

16680 

16740 

16800 

16860 

16920 

16980 

17040 

17100 

17160 

17220 

17280 

17340 

17400 

17460 

17520 

17580 

17640 

17700 

17760 

17820 

17880 

17940 

18000 

18060 

18120 

18180 

18240 

18300 

18-360 - 

18420 
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acgacgggca cccgtaccgc cttcggcaac ttcatgcccg gggcgggcga gttcgacgcg 18480 

gcgttcttcg ggatctcgcc gcgtgaggcg ttggcgatgg atccgcagca gcggcacgcc 18540 

ctggagacca cctgggaggc gctggagaac gccggtatcc ggcccgagtc gttgcgcggt 18600 

acggacaccg gtgtcttcgt gggcatgtcc catcaggggt acgccaccgg ccgcccgaag 18660 

cccgaggacg aggtcgacgg ctacctgttg acaggcaaca ccgcgagcgt cgcctccggt 18720 

cggatcgcgt acgtgttggg gttggagggg ccggcgatca ctgtggacac ggcgtgttcg 18780 

tcgtcgcttg tggcgttgca cgtggcggcg ggttcgttgc gttctgggga ctgtggtctg 18840 

gcggtggcgg gtggggtgtc ggtgatggcc ggtccggagg tgttcaggga gttctcccgg 18900 

cagggcgcgt tggctccgga cggcaggtgc aagcccttct cggacgaggc cgacggcttc 18960 

ggtctggggg aggggtcggc cttcgtcgtg ttgcagcggt tgtcggtggc ggtgcgggag 19020 

gggcgtcggg tgttgggtgt ggtggtgggt tcggcggtga atcaggatgg ggcgagtaat 19080 

gggttggcgg cgccgtcggg ggtggcgcag cagcgggtga ttcggcgggc gtggggtcgt 1914 0 

gcgggtgtgt cgggtgggga tgtgggtgtg gtggaggcgc atgggacggg gacgcggttg 19200 

ggggatccgg tggagttggg ggcgttgttg gggacgtatg gggtgggtcg gggtggggtg 19260 

ggtccggtgg tggtgggttc ggtgaaggcg aatgtgggtc atgtgcaggc ggcggcgggt 19320 

gtggtgggtg tgatcaaggt ggtgttgggg ttgggtcggg ggttggtggg tccgatggtg 19380 

tgtcggggtg ggttgtcggg gttggtggat tggtcgtcgg gtgggttggt ggtggcggat 19440 

ggggtgcggg ggtggccggt gggtgtggat ggggtgcgtc ggggtggggt gtcggcgttt 19500 

ggggtgtcgg ggacgaatgc tcatgtggtg gtggcggagg cgccggggtc ggtggtgggg 19560 

gcggaacggc cggtggaggg gtcgtcgcgg gggttggtgg gggtggttgg tggtgtggtg 19620 

ccggtggtgc tgtcggcaaa gaccgaaacc gccctgcacg cccaggcacg tcgactcgcc 19680 

gaccacctgg agacgcaccc cgacgtcccg atgaccgacg tggtgtggac gctgacgcag 19740 

gcccgccaac gcttcgacag gcgcgcggtc ctcctcgccg ccgaccggac ccaggccgtg 19800 

gaacggctgc gcggcctcgc cgggggcgaa ccggggaccg gtgtggtgtc gggggtggcg 19860 

tcgggtggtg gtgtggtgtt tgtttttcct ggtcagggtg gtcagtgggt ggggatggcg 19920 

cgggggttgt tgtcggttcc ggtgtttgtg gagtcggtgg tggagtgtga tgcggtggtg 19980 

tcgtcggtgg tggggttttc ggtgttgggg gtgttggagg gtcggtcggg tgcgccgtcg 20040 

ttggatcggg tggatgtggt gcagccggtg ttgttcgtgg tgatggtgtc gttggcgcgg 20100 

ttgtggcggt ggtgtggggt tgtgcctgcg gcggtggtgg gtcattcgca gggggagatc 20160 

gcggcggcgg tggtggcggg ggtgttgtcg gtgggtgatg gtgcgcgggt ggtggcgttg 20220 

cgggcgcggg cgttgcgggc gttggccggc cacggcggca tggcctcggt acgccgaggc 20280 

cgcgacgacg 'tacagaagct cctcgacagc ggcccctgga cggggaagct ggagatcgcc 20340 

gcggtcaacg gccccgacgc ggtggtggtc tccggcgacc cccgagccgt gaccgagctg 204 00 

gtcgagcact gtgacgggat cggggtccgg gcccggacga tccccgtcga ctacgcctcc 204 60 

cactccgcac aggtcgagtc gctccgggag gagctgctct ccgtcctggc cgggatcgag 20520 

ggccgcccgg cgacggtgcc gttctactcc accctcaccg gtgggttcgt cgacggcacc 20580 

gaactggacg ccgactactg gtaccgcaac ctgcgccacc cggtgcggtt ccacgccgcc 20640 

gtcgaggcgc tggcagcgcg tgacctcacc acgttcgtcg aggtcagccc gcaccccgtg 20700 

ctgtcgatgg cggtcgggga gacgcttgcc gacgtggagt ccgccgtcac tgtgggcacc 207 60 

ctggaacgcg acaccgacga cgtcgagcgc ttcctcacct ccctcgccga ggcgca'cgtc 20820 

cacggcgtac ccgtggactg ggcggcggtc ctcggctccg gaaccctggt cgacctgccc 20880 

acctatccct tccagggacg gcggttctgg ctgcaccccg accgtggtcc gcgtgacgat 20940 

gtcgccgact ggttccaccg ggtcgactgg acggcgacgg ccaccgacgg gtcggcccga 21000 

ctcgacggtc gctggctggt ggtcgtaccc gaggggtaca cggacgacgg ctgggtcgtg 21060 

gaggtgcggg ccgccctcgc cgccggtggt gccgagccgg tggtgacgac ggtcgaggag 21120 

gtcaccgacc gggtcggtga cagcgacgcg gtggtgtcga tgctcgggct ggccgacgac 21180 

ggtgcggccg agaccctggc gctgctgcga cgactcgacg cacaggcgtc caccacccca 21240 

ctgtgggtgg tcaccgtggg ggccgtcgcc cccgccggtc cggtgcagcg ccccgaacag 21300 

gcgacggtgt gggggttggc ccttgtcgcc tccctggaac gcggacaccg gtggaccggc 21360 

ctgctggatc tgccgcagac accggacccg cagctacgac cccggctggt cgaggcgctc 21420 

gccggtgccg aggaccaggt agcggtccgc gccgacgccg tacacgcccg tcggatcgtc 21480 

cccaccccgg tcaccggagc cgggccgtac accgccccgg gcgggacgat cctcgtcacc 21540 

gggggcaccg ccggtctggg tgccgtcacc gcccgatggc tcgccgagcg cggtgccgaa 21600 

cacctcgccc tggtcagccg gcgcgggccg ggcaccgccg gcgtcgacga ggtggtccgg 21660 

gacctgaccg ggctcggcgt acgggtgtcg gtgcactcct gcgacgtcgg cgaccgcgag 21720 

tcggtcggcg ccctggtgca ggagttgaca gcagccggtg acgtggtccg gggggtggtc 21780 

cacgctgccg gtctgcccca gcaggtgcca ctgaccgaca tggacccggc cgacctcgcc 21840 

gacgtggtgg ccgtgaaggt cgacggcgcg gtgcacctgg ccgacctgtg cccggaggcc 21900 

gaactgttcc tgctgttctc ctccggggcc ggggtgtggg gcagtgcccg tcagggtgcg 21960 

tacgccgccg gaaacgcctt cctggacgcc ttcgcccgac accggcggga ccggggtctg 2-2020 

cccgccacct cggtggcgtg ggggctctgg gcggccgggg ggatgacagg ggaccaggag 22080 
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gcggtgtcgt 
gcgctggaac 
gcggccttcg 
acacctgcgg 
ctggcggccc 
gccgcagccg 
ctcgggttcg 
ctgcgtctgc 
ctccacgacc 
ctggccgcgc 
gagcgcctgg 
ccgaccgccg 
cgggaactcg 
ggacctgtga 
cgggccgccc 
qcctqccqcc 
gggcacgaga 
cacccggacc 
gtggcgggct 
ccgcaacagc 
ccgcactccc 
gqcgaqaacg 
gctgtcgcct 
gacaccgcgt 
ggcgagtcga 
gtcgacttca 
gccgccgacg 
gaggccgaaa 
gacggggcca 
caggcgctac 
accggcacca 
gaccgggatc 
caggcggcgg 
ctgcccgcca 
gtacgcctgg 
gtgtcggcgt 
cggaccaccg 
cggtcggcgg 
gacgtcgggc 
cgggcggcgg 
gcggtcgaac 
gtcgtcttcc 
gactcggcac 
caggactggt 
gtcgacgtgg 
tcgtacgggg 
cacgtggcgg 
ttgctgcggt 
gtacgccgcc 
cggtcggtgg 
gccgagggcg 
gacagggtcc 
atcaccttct 
tactggtacc 
gactcgggat 
gccgaggcgg 
ggcgacggcg 
gacgtcgact 
ttccaacgga 
gcctaccggg 
tggctggtgg 



tcctgcgtga 
gggtcctcac 
ccgagtcgta 
cggcggtcgg 
tgccccgggc 
tgctcggcag 
actcgctggc 
cggccaccct 
gactcggcga 
tggagcaggc 
aacggatgct 
gtgacgacct 
acgccaggtg 
ctgacaacga 
gcaagcgcct 
taccgggcgg 
cggtgtccac 
ccgaccaccc 
tcgacgccga 
ggctgctgtt 
tgcgtggcac 
gcaccgaagc 
ccgggcggat 
gctcgtcgtc 
gtctcgctgt 
gccgccagcg 
ggttcggctt 
gcaacggcca 
gcaacggtct 
gaaactgcgg 
cgctcggcga 
cggaccaccc 
cgggcgtcac 
ccctgcacgt 
cgacccgggg 
tcggcatcag 
agcgcaccgt 
cggcgctacg 
tggcggaggt 
tggtggcgtc 
cgcgcggcga 
tcttcccggg 
cggcgttcgc 
cggtctccga 
tgcagccggt 
tcacccccgc 
gtgcgctctc 
cgctgtccgg 
gactgcggtc 
tggtggccgg 
tacgggtccg 
gtgacgaact 
actcgacggt 
gcaacctgcg 
acgacgcgtt 
tcgaggaggc 
gaccgggggc 
ggacgcccgc 
agccgtactg 
tgtcctggac 
tgcaccccgg 



gcggggcgta 
cgccggggag 
cacctccgcc 
cgagcgcgac 
cgagcggtcg 
cgacgcgaag 
cgcggtccgg 
ggtcttcgag 
ggccggcgag 
cctgcccgac 
cgccgggctc 
gggggaggcc 
aacccgaact 
caaggtggcg 
gcgcgagctg 
ggtgcacctc 
cttccccacc 
cggcaccagc 
gttcttcggg 
ggagaccagt 
cccgaccggc 
cggtgacgcc 
ctcctacgcc 
gttggtggcg 
cgtcggcggg 
ggcgttggcc 
ctccgagggg 
cgaggtgttg 
cgccgcgccg 
cctgaccccg 
cccgatcgag 
gctgtggctg 
cgggctgctc 
cgacgagccc 
ccggccgtgg 
cgggaccaac 
cggcggcgac 
ggcccaggcg 
cgggcggagc 
gacccgggcc 
ggacaccgtc 
acaggggtcc 
cgacacgatc 
cgtgctccgg 
gctgttcgcg 
tgcggtggtg 
cctcgccgac 
gggcggcggc 
gtgggaggac 
ggaaccggag 
cgagatcgac 
cctgacggtc 
cgacgtccgt 
ggagacggtc 
cgtcgaggtc 
aggtgtcgag 
gttcctgcgg 
cctcccggga 
gctgcggtcg 
gccgatcacc 
gggcagcacc 



cggccgatgt 
accgcggtgg 
cggccccggc 
gagccgcgtg 
gcggagctgg 
gccgtacccg 
ttccgtaacc 
cacccgaacg 
ccgacccccg 
gcctccgaca 
cgccccgagg 
ggcgtcgacg 
gaccgcagcc 
gagtacctcc 
caatccgacc 
ccgcagcacc 
gggcgcggct 
tacgtcgacc 
atctccccgc 
tgggagctgg 
gtcttcctcg 
gagggctatt 
ctcgggctgg 
ctgcacctgg 
gcggcggtca 
gctgacggca 
gtctccctcg 
gctgtcatcc 
aacgggaccg 
gccgacgtgg 
gccaacgccc 
gggtcggtga 
aagatggtgc 
accccgcacg 
cggcggggtg 
gcccacgtga 
gtcggcccgg 
gcccaggtcg 
ctggccgtga 
gaggcggtgc 
accggggtcg 
cagtgggtcg 
cgcgcctgcg 
caggagccgg 
gtgatggtgt 
gggcactcgc 
gcggcgaggc 
atgagcgccg 
cggatctccg 
gcgctgcggg 
gtcgactacg 
acgggggaga 
gctgtcgacg 
cggttcgccg 
agcccgcatc 
gacgccgtcg 
tcggcggcca 
gctgcgacga 
tctgctcccg 
ccgcccgggg 
ggatgggtcg 



cggtgccgag 
tcgtcgccga 
cgctgctcca 
agcagaccct 
tacgcctggt 
ccaccacgcc 
ggctggccgc 
ccgcagccgt 
tccggtcggt 
cggagcgggt 
ccggagccgg 
aactcctcga 
gcagccgaag 
gtcgtgcgac 
cgatcgcggt 
tqtgqqacct 
gggacctggc 
ggggtgggtt 
gcgaggccac 
tggagagcgc 
gcgtggcgcg 
cggtgaccgg 
agggtccgtc 
cggtcgagtc 
tggcgacacc 
ggtcgaaggc 
tcctgctcga 
gtggctccgc 
cccagcgcaa 
acgccgtgga 
tgctggacac 
agtcgaacat 
tggcactgcg 
tggactggtc 
accggccgag 
tcgtcgagga 
tcccqctcqt 
ccgagctggt 
cccgggcgcg 
gggggctgcg 
ccgagacgtc 
ggatgggcgc 
acgaggcgat 
gggcaccggg 
cgttggcgcg 
agggggagat 
tggtggtggg 
tcgcgctcgg 
tggccgccgt 
agtggggacg 
cctcgcactc 
tcgagccccg 
gcaccgacct 
acgcgatgac 
cggtggtggt 
tcgtcggcac 
ccgcccactg 
tcccgttgcc 
cccccgcctc 
acggcgtact 
acgggttggc 



ggcactggaa 
cgtcgactgg 
ccggctcgtc 
ccgggaccgg 
ccggcgggac 
gttcaaggac 
ccacaccggt 
cgccgacctc 
gggcgccgga 
cgagctggtc 
ggccgacgcc 
cgcgctcgaa 
cagagaccga 
gctcgacctg 
cgtcggcatg 
cctgcgccag 
cgggctcttc 
cctcgacgac 
ggccatggac 
cggcatcgat 
gctcggctac 
ggtggcaccc 
gatcagcgtg 
gctgcggctg 
aggggtgttc 
cttcggggcc 
acggctctcc 
cctcaaccag 
ggtgatccgg 
ggcgcacggc 
ctacggccgt 
cggccacacg 
ccacgaggaa 
ctcgggagcg 
gcgggccggg 
ggcacccgag 
ggtgtccgcc 
ggagggctcc 
acacgagcac 
cgaggtcgcg 
cgggcgcacc 
ggagctgctg 
ggcaccgttg 
actggaccgg 
gttgtggcag 
cgccgccgcc 
ccgcagccgg 
tgaggccgag 
caacggaccc 
ggagcgggag 
gccgcagatc 
gtcggcggag 
ggacgcgggg 
ccggttggcc 
gtcggcggtc 
cctgtcccgg 
cgccggtgtg 
gacgtacccg 
ccacgatctc 
cgacggcgac 
ggcggcgatc 
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accgccggcg gtggccgggt cgtcgcccac ccggtggact ccgtgacctc ccggaccggc 25800 

ctggccgagg cgctcgcccg gcgggacggc acgttccggg gggtgctgtc gtgggtggcg 25860 

accgacgaac ggcacgtcga ggccggtgcg gtcgccctgc tgaccctggc gcaggcgttg 25920 

ggtgacgccg gaatcgacgc accactgtgg tgcctgaccc aggaggcggt ccgtaccccc 25980 

gtcgacggtg acctggcccg accggcgcag gccgccctgc acggtttcgc ccaggtcgcc 26040 

cggctggagc tggcccgccg cttcggtggg gtgctcgacc tgcccgccac cgtcgacgcc 26100 

gccgggacgc gtctggtcgc ggcggtcctc gccggcggcg gcgaggacgt cgtcgccgtc 26160 

cgtggcgacc gtctctacgg ccgtcgcctg gtcagggcga ccctgccgcc gcccggcggg 26220 

gggttcaccc cgcacggcac cgtcctggtc accggcgcgg ccggtccggt gggcggtcgg 26280 

ctggcccggt ggctcgccga acggggtgcc acccgactcg tcctgcccgg cgcacacccg 26340 

ggcgaggagt tgctgaccgc gatccgggcc gccggtgcca ccgccgtggt gtgcgaaccg 26400 

gaggcggagg cactgcgtac ggcgatcggc ggggagttgc cgaccgcgct cgtacacgcc 264 60 

gagacgttga cgaacttcgc cggcgtcgcc gacgccgacc ccgaggactt cgccgccacc 26520 

gtcgcggcga agaccgcgct gccgacggtc ctggcggagg tgctcggcga ccaccgcctc 26580 

gaacgggagg tctactgctc gtcggtggcc ggggtctggg gtggggtcgg catggccgcg 2 664 0 

tacgccgccg gcagcgccta cctcgacgcc ctggtcgagc accgtcgcgc ccgggggcac 26700 

gccagcgcct cggtggcctg gaccccgtgg gccctgcccg gcgcggtcga cgacggtcgg 26760 

ctgcgcgagc gcggcctgcg cagcctcgac gtggccgacg ccctcgggac gtgggaacgt 26820 

ctgctccgcg ccggtgcggt gtcggtggcc gtcgccgacg tcgactggtc ggtcttcaca 26880 

gagggtttcg cggccatccg gccgaccccg ctcttcgacg aactcctcga ccggcgcggg 26940 

gaccccgacg gcgcgcccgt cgaccggccg ggggagccgg cgggcgagtg gggtcgacga 27000 

atcgcggcgc tgtccccgca ggaacagcgg gagacgttgc tgaccctcgt cggcgagacg 27060 

gtcgcggagg tgctgggaca cgagaccggc accgagatca acacccgtcg ggccttcagc 27120 

gaactcggcc tcgactcgct gggctcgatg gccctgcgtc agcgcctggc ggcccgtacc 27180 

ggcctgcgga tgccggcctc gctggtcttc gaccacccga cggtcaccgc gctcgcgcgg 27240 

tacctgcgtc gactggtcgt cggggactcc gacccgaccc cggtacgggt gttcggcccc 27300 

accgacgagg ccgaacccgt cgccgtggtc ggcatcggct gccggttccc cggcggcatc 27360 

gccacccccg aggacctctg gcgggtggtg tccgagggca cctccatcac caccggattc 27420 

cccaccgacc ggggctggga cctccggcgg ctctaccacc ccgacccgga ccaccccggc 27480 

accagctacg tcgacagggg gggattcctc gacggggccc cggacttcga ccccgggttc 2754 0 

ttcgggatca ccccccgcga ggcgctggcg atggacccgc agcagcggct caccctggag 27600 

atcgcgtggg aggcggtgga acgggcgggc atcgacccgg agaccctcct cggcagcgac 27660 

accggcgtct tcgtcggcat gaacggccag tcctacctgc aactgctgac cggggagggt 27720 

gaccggctca acggctacca ggggttgggc aactcggcga gcgtgctctc cggccgtgtc 27780 

gcctacacct tcgggtggga ggggccggcg ctgacggtgg acaccgcctg ctcgtcctcg 27840 

ctggtcgcca tccacctcgc catgcagtcg ctgcgtcggg gtgagtgctc gctggcgttg 27900 

gccggcgggg tgacggtcat ggccgacccg tacaccttcg tggacttcag cgcacagcgg 27960 

gggctcgccg ccgacgggcg gtgcaaggcg ttctccgcgc aggccgacgg gttcgccctc 28020 

gccgagggcg tcgcggcgct cgtcctcgaa ccgttgtcca aggcgcggcg aaacggccac 28080 

caggtgctgg cggtgctgcg cggcagcgcc gtcaaccagg acggggccag caacggcctc 28140 

gccgccccga acgggccgtc gcaggaacgg gtgatcaggc aggccctgac cgcctccggg 28200 

ctgcgtcccg ccgacgtcga catggtggag gcgcacggga cgggcaccga actcggcgac 28260 

ccgatcgagg ccggggcgct catcgcggcg tacggccggg accgggaccg gccgctctgg 28320 

ctgggctcgg tgaagacgaa catcggccac acccaggccg ccgccggtgc cgccggggtg 28380 

atcaaggcgg tcctggcgat gcggcacggc gtactcccga ggtcgctgca cgccgacgag 28440 

ttgtccccgc acatcgactg ggcggacggg aaggtcgagg tgctccgcga ggcacgacag 28500 

tggccccccg gtgagcgccc ccgccgcgcc ggggtgtcct ccttcggcgt cagcgggacc 28560 

aacgcccacg tcatcgtcga ggaggcaccc gccgaaccgg accccgaacc ggttcccgcc 28620 

gccccgggcg ggcccctgcc cttcgtcctg cacggacgca gcgtccagac ggtccggtcc 28680 

caggcgcgga ccctcgccga acacctgcgc accaccggcc accgggacct cgccgacacc 28740 

gcccgtaccc tggccaccgg tcgcgcccgt ttcgacgtcc gggccgcagt gctcggcacc 28800 

gaccgggagg gtgtctgcgc cgccctcgac gcgctggcgc aggatcgccc ctcgcccgac 28860 

gtcgtcgccc cggcggtctt cgccgcccgt acccccgtcc tggtcttccc cgggcagggg 28920 

tcgcagtggg tcggcatggc ccgtgacctg ctcgactcct ccgaggtgtt cgccgagtcg 28980 

atgggccggt gcgccgaggc gctgtcgccg tacaccgact gggacctgct cgacgtggtc 2904 0 

cgtggggtcg gcgaccccga cccgtacgac cgggtggacg tgctccagcc ggtgctgttc 29100 

gcggtgatgg tgtcgctggc gcggttgtgg cagtcgtacg gggtgactcc gggtgcggtg 29160 

gtgggtcact cgcaggggga gatcgccgcc gcgcacgtgg ctggtgcgtt gtcgttggcc 29220 

gacgccgcca gggtggtggc gttgcgcagc cgggtgctgc gggagctcga cgaccagggc 29280 

ggcatggtgt cggtcggcac ctcccgcgcc gagttggact cggtcctgcg ccggtgggac 2^340 

gggcgggtcg cggtggcggc ggtgaacgga cccggcacgc tcgtggtggc cggacccacc 29400 
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gccgaactgg acgagttcct cgcggtggcc gaggcccgcg agatgaggcc gcgtcggatc 294 60 

gcggtgcgct acgcgtcgca ctccccggag gtggcccggg tcgaacagcg gctcgccgcc 29520 

gaactcggca ccgtcaccgc cgtcggcggc acggtcccgc tctactccac cgccaccggg 29580 

gacctcctcg acaccacagc catggacgcc gggtactggt accgcaacct gcgccaaccg 29640 

gtgctgttcg agcacgccgt ccgcagcctc ctggagcggg gattcgagac gttcatcgag 29700 

gtcagcccgc accctgtgct gctgatggcg gtcgaggaga ccgccgagga cgccgagcgc 297 60 

ccggtcaccg gcgtgccgac gctgcgccgc gaccacgacg ggccgtcgga gttcctccgc 29820 

aacctcctgg gggcgcacgt gcacggggtc gacgtcgacc tgcgtccggc ggtcgcccac 29880 

ggccgcctgg tcgacctgcc cacctacccc ttcgacaggc agcggctctg gcccaagccg 29940 

caccgcaggg ccgacacctc gtcgctgggg gtccgtgact cgacccaccc gctgctgcac 30000 

gccgcagtcg acgtacccgg tcacggcgga gcggtgttca ccgggcggct ctcccccgac 30060 

gagcagcagt ggctgaccca gcacgtggtg ggtgggcgga acctggtgcc cggcagtgtc 30120 

ctggtcgacc tcgcgctcac cgccggggcc gacgtcggcg tgccggtgct ggaggaactc 30180 

gtcctgcagc agccgctggt gttgaccgcc gccggtgcgt tgctgcgcct gtcggtcggc 30240 

gccgccgacg aggacgggcg gcggccggtc gagatccacg ccgccgagga cgtctccgac 30300 

ccggccgagg cccggtggtc ggcgtacgcg accgggaccc tcgccgtcgg cgtggccggc 30360 

ggcggccggg acggcacaca gtggcccccg cccggcgcca ccgccctgac gttgaccgac 30420 

cactacgaca ccctcgccga actgggctac gagtacgggc cggcgttcca ggcgctgcgc 30480 

gccgcgtggc agcacggcga cgtggtctac gcggaggtgt ccctcgacgc cgtcgaggag 30540 

gggtacgcgt tcgacccggt gctgctcgac gccgtcgccc agaccttcgg cctgaccagt 30600 

cgcgcccccg ggaagctccc cttcgcctgg cggggcgtca ccctgcacgc caccggggcc 30660 

actgcggtac gggtggtggc gacccccgcc ggaccggacg cggtggccct gcgggtcacc 30720 

gacccgaccg gtcagctcgt cgccacggtg gacgccctgg tcgtcaggga cgccggggcg 30780 

gatcgggacc agccgcgcgg ccgcgacggc gacctgcacc gcctggagtg ggtacggctg 308 40 

gccaccccgg acccgacccc ggcggcggtg gtgcacgtgg cggccgacgg gctcgacgac 30900 

ctgctgcgcg ccggtggtcc ggcaccacag gccgtcgtcg tccgctaccg tcccgacggc 30960 

gacgacccga cggccgaggc ccgtcacggg gtgctctggg cggccacgct cgtgcgccgt 31020 

tggctcgacg acgaccggtg gcccgccacc accctggtgg tggccacgtc cgcaggggtc 31080 

gaggtctccc ccggggacga cgtgccgcgc cccggggccg ccgccgtgtg gggggtgctg 31140 

cgctgcgccc aggcggagtc cccggaccgc ttcgtgctcg tcgacggcga cccggagacg 31200 

cccccggcgg tgccggacaa tccgcagctc gcggtccgtg acggtgcggt gttcgtgcca 31260 

cggctgacgc cgctcgccgg tcccgtgccg gccgtcgccg accgggcgta ccggctggtg 31320 

cccggcaacg gcggctccat cgaggcagtg gccttcgccc ccgtccccga cgccgaccgg 31380 

cccctggcgc cggaggaggt acgcgtcgcc gtccgcgcca ccggcgtgaa cttccgtgac 314 40 

gtcctgctcg cgctcggcat gtacccggaa ccggccgaga tgggcaccga ggcgtccggt 31500 

gtggtcaccg aggtcgggtc gggtgtccgg cggttcaccc ccggccaggc ggtgacgggc 31560 

ctgttccagg gggccttcgg gccggtggcg gtcgccgacc accggctcct caccccggtc 31620 

cccgacgggt ggcgggcggt ggacgccgca gccgtaccca tcgcgttcac caccgcccac 31680 

tacgcgctgc acgacctggc cgggttgcag gccgggcagt ccgtgctggt ccacgccgcc 3174 0 

gccggcgggg tggggatggc tgccgtcgcg ttggcccgtc gggccggggc ggaggtgttc 31800 

gccacggcca gcccggccaa acacccgacg ctgcgggcgc tcggcctcga cgacgaccac 31860 

atcgcctcgt cccgggagag cgggttcggt gagcggttcg ccgcgcgtac cggggggcgg 31920 

ggcgtcgacg tggtcctgaa ctcgctcacc ggcgacctgc tcgacgagtc cgcgcggctg 31980 

ctcgccgacg gcggggtctt cgtcgagatg ggcaagaccg acctgcggcc ggcggagcag 32040 

ttccggggcc ggtacgtccc gttcgacctg gccgaggccg gtcccgatcg gctcggcgag 32100 

atcctggagg aggtcgtcgg tctgctggcc gccggtgccc tcgaccggtt gccggtgtcg 32160 

gtgtgggagt tgtcggcggc cccggccgcg ctcacccaca tgagccgggg ccgacacgtg 32220 

ggcaagctcg tcctcaccca gcccgccccc gtgcaccccg acggaacggt gctggtcacc 32280 

ggcgggaccg gcaccctggg gcggctggtc gcccgccacc tggtgaccgg gcacggcgta 32340 

ccccacctcc tggtggccag ccggcgcggt ccggcggccc cgggcgcggc cgagctgcgc 32400 

gccgacgtcg aaggcctcgg cgcgaccatc gagatcgtcg cctgcgacac cgccgaccgg 324 60 

gaggcgctcg cggcgctgct cgactcgatc cccgcggacc gtccgctgac cggggtggtg 32520 

cacaccgccg gggtcctggc cgacgggctg gtcacctcca tcgacgggac cgccaccgat 32580 

caggtcctgc gggccaaggt cgacgcggcg tggcacctgc acgacctgac ccgggacgcg 32640 

gacctgagct tcttcgtgct gttctcgtcg gcggcgtcgg tgctggccgg tcccgggcag 32700 

ggcgtgtacg cggcggccaa cggggtcctc aacgccctgg ccgggcaacg gcgggccctc 327 60 

ggactgcccg cgaaggcgct cgggtggggc ctgtgggcgc aggccagcga gatgaccagc 32820 

ggcctcggtg accggatcgc ccgtaccggg gtcgccgcgc tgccgaccga gcgggcgctg 32880 

gccctgttcg acgcggctct gcgcagcggc ggggaggtgc tgttcccgct gtctgtcgac 32940 

aggtcggcgc tgcgccgggc cgagtacgtc cccgaggtgc tgcgcggcgc ggtccggtcc -33000- 

acgccacggg ccgccaacag ggccgagacc ccgggccggg gcctgctcga ccgtctcgtc 33060 
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ggtgcacccg 
gcggtcgccg 
gggttcgact 
cggctgccca 
cggtcggagt 
ctggaacggg 
ctggaggcgc 
atcagtgacg 
ggaggggacg 
acaggtccac 
atccgatgag 
ccgtcgccga 
aaccgatcgc 
cgttctggga 
gctggccgcc 
acgccgcctt 
tgatgctgga 
gcggcagcgc 
acgaggcacc 
ccggacgggt 
gctcctccgg 
ccctggtcct 
gcagccaggg 
gcttcgggct 
ccgagggccg 
gcaacgggct 
agcgggcgcg 
ggctgggcga 
ccggccgccc 
cgggggtggc 
cgttgcactt 
tgtccgagac 
tcggcatcag 
ccgacctcga 
ccaccgccga 
ccctgcgcgc 
tgcgcgacac 
tcgtcggcgg 
tcgacggagc 
ggcagggcgc 
cggagtccat 
aggtgctcga 
cggtgatggt 
tgggtcactc 
acgccgccag 
ggatggcgtc 
gtgcgctgac 
gcccgttgga 
ccgtcgacta 
cactggccgg 
aggtcatcga 
tgcgcttcca 
tcagcccgca 
ccgacgcgga 
tccacaccgc 
tgggtgaggg 
tcccggtccc 
ggcaccccgt 
cggcagtacc 
ccgtcgtgtt 
acggcaccgc 



agaccgatca 

gctacgactc 

cgctggcggc 

gcacgctggt 

tgttcgccga 

cgctcgacgc 

tgctgcgccg 

acgccagtga 

tctaggtgac 

cgggttcgcg 

cgagagcagc 

actcgactcg 

cgtcgtcggc 

gttcatccgc 

ggcaccgcga 

cttcggcatc 

gatctcctgg 

cggtggcgtc 

cgaggaggtg 

ggcgtacacc 

gctcaccgcg 

cgccggtggg 

cgggttggcc 

cgccgagggg 

gccggtgctg 

caccgcgccg 

gctgcgtccc 

tccgatcgag 

gctctgggtc 

cggggtgatg 

cgacgagccc 

ccggccctgg 

cggcaccaac 

cccgaccccc 

gccgggtgcg 

ccaggcggcc 

cgccttcacc 

gggcgaggag 

cgtcagcggg 

acagtggcag 

cgacgcctgc 

cggcgagcag 

gtcgttggcg 

gcagggggag 

ggtggtggcg 

gttcgggctc 

tgtcgcctcg 

cgagctgatc 

cgcctcacac 

ggtccgtccg 

aacggcgacg 

ggacgccacc 

cccggtgttg 

tccgtgtgtc 

gctcgccgag 

acgcccggtc 

cctgggccgg 

cgacctcggg 

cccggcctgg 

gtgcaccgcg 

cctgtccact 



ggtggccgcg 
ggccgaccag 
ggtggagctg 
gttcgaccac 
ctccgcgccg 
cctgcccgac 
gtggcagagc 
cgacgagctg 
aggtcgattc 
tcgcctccca 
ggcatgaccg 
gtgacaggtc 
atggcctgcc 
gacggtggtg 
ccccgcctcg 
tcaccccgcg 
gaggcgttgg 
ttcaccggtg 
ctcggctacg 
ctggggttgg 
gtgcacctgg 
gtcaccgtga 
gaggacggcc 
gccggggtcc 
gccgtactgc 
agcggccccg 
gtcgacgtgg 
gcgcacgccc 
ggatcggtga 
aagaccgtgc 
tcgccgcacg 
ccggtggggg 
gcgcacgtca 
ggcccggcaa 
gaggcggtcg 
cggctcgccg 
ctggtcaccc 
gtcctcgccg 
cgggcgcgcg 
ggcatggccc 
gagcgggcgc 
tcgttggacc 
cggttgtggc 
atcgccgccg 
ttgcgcagcc 
caccccgacc 
gtcaacggtc 
gccgagtgcg 
tccccgcagg 
gtgtcggccg 
atggacgccg 
aggcagctcg 
acagtcggtg 
acaggcaccc 
gcgtacaccc 
gacctgccgg 
gtccccgaca 
cggtcctccc 
acggacgtgg 
cagtcgcgcg 
gtggtctctc 



ctggccgagc 
ctgcccgaac 
cgcaaccggc 
ccgacaccgc 
gacgtcgggg 
gcgcagggac 
cgacgacccc 
ttctcgatgc 
cgccccgcgg 
cacccgacgg 
aggaccgcct 
ggctcgacga 
ggttccccgg 
acgcgatcgc 
gtggtctcct 
aggcgctcgc 
agcgtgcggg 
tcggtgcggt 
tcggcatcgg 
agggtccagc 
cgatggagtc 
tgagcagccc 
gctgcaaacc 
tggtgctcca 
gtggctcggc 
cccagcggcg 
actacgtgga 
tgctcgacac 
agtccaacat 
tggcgctgcg 
tcgactggga 
agcgcccgcg 
tcgtcgagga 
ccggagcgac 
cactggtgtt 
accgtctcac 
gccgtgccac 
gcctccgggc 
ccggccgccg 
gggacctgct 
tcgccccgca 
ccgtcgacgt 
agtcgtacgg 
cgcacgtggc 
gggtgctgcg 
aggccgccga 
cccgttcggt 
aggccgaggg 
tggagtcgct 
ggatccccct 
actactggtt 
ccgaggcggg 
tcgaggccac 
tgcgccgcga 
ggggggtgga 
tctacccgtt 
ccggcgacga 
tggccggacg 
tccgcgacgg 
cccggatcgg 
tgctcgcgct 



tggtccgctc 
gcaaggcgtt 
tcggcgtcac 
tggcggtggc 
tcggtgcgcg 
acgccgacgt 
cggagaccga 
tcgacaggcg 
cagtggaccg 
ccggggtatc 
ccggcgctat 
ggtcgagtac 
gggtgtggac 
cgaggcgccc 
cgcggagccg 
gacggacccc 
tttcgacccg 
ggactacgga 
caccgcctcc 
cgtcaccgtc 
gctgcgccgc 
gggtgcgttc 
gttctcccgc 
acggctgtcc 
gatcaaccag 
ggtgatcagg 
ggcccacggc 
gtacggtgcc 
cggtcacacc 
gcatcgggag 
ccggggtgcg 
ccgggcgggg 
ggcgccgagc 
ccccggaacg 
ctccgcgcgc 
cgacgacccg 
ctgggagcat 
cgtcgccggg 
ggtggtgctg 
gcggcagtcg 
cgtggactgg 
ggtgcagccg 
ggtgactccg 
tggtgcgttg 
ccgtctcggt 
gcggatcgcg 
ggtgctggcc 
cgtgaccgcc 
gcgtgaggag 
gtactcgacc 
cgccaacctc 
gttcgacgcc 
cctcgaggca 
acgcggcggt 
ggtcgactgg 
ccaacgacag 
gtggcgttac 
ggtcctggtg 
cctggaacag 
cgccgcactc 
cgccgagggc 



gcacgcggcg 
caaggacctc 
caccggcgta 
cgaacacctg 
cctcgacgac 
cggggcccgc 
gccagtgacg 
tctcggcggg 
taccgccctg 
cacggaaggg 
ctcaagcgca 
cgggcccgcg 
tcgccggagg 
acggaccgtg 
ggcgcgttcg 
cagcagcgcc 
tcgagcctgc 
cccaggccgg 
agcgtcgcct 
gacaccgcct 
gacgagtgca 
accgagttcc 
gccgccgacg 
gtcgcccggg 
gacggtgcca 
caggcgttgg 
accggcaccc 
gaccgggaac 
caggcggcgg 
atcccggcga 
gtgtcggtgg 
gtgtcctcgt 
ccgcaggcgg 
gatgccgccc 
gacgagcggg 
gccccctcgt 
cgggcggtcg 
ggacgtcccg 
gtcttccccg 
ccgaccttcg 
tcgctgcgcg 
gtgctgttcg 
ggtgcggtgg 
tcgttggccg 
ggtcacggcg 
cgcttcgcgg 
ggggagaacg 
cgtcggatcc 
ctgctcgccg 
ctgaccggtc 
cgggagccgg 
ttcgtcgagg 
gtgctgcccc 
ctcgcgcagt 
cgtaccgcag 
aacttctggc 
cagctcgcct 
gtgaccggag 
cgcggggcga 
gacgccgtcg 
ggtgctgtcg 



33120 

33180 

33240 

33300 

33360 

33420 

33480 

33540 

33600 

33660 

33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 

34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 

34980 

35040 

35100 

35160 

35220 

35280 

35340 

35400 

35460 

35520 

35580 

35640 

35700 

35760 

35820 

35880 

35940 

36000 

36060 

36120 

36180 

36240 

36300 

36360 

36420 

36480 

36540 

36600 

3*660' 

36720 
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acgaccccag cctggacacc ctcgcgttgg tccaggcgct cggcgcagcc gggatcgacg 36780 

tccccctgtg gctggtgacc agggacgccg ccgccgtgac cgtcggagac gacgtcgatc 36840 

cggcccaggc catggtcggt gggctcggcc gggtggtggg cgtggagtcc cccgcccggt 36900 

ggggtggcct ggtggacctg cgcgaggccg acgccgactc ggcccggtcg ctggccgcca 36960 

tactggccga cccgcgcggc gaggagcagt tcgcgatccg gcccgacggc gtcaccgtcg 37020 

cccgtctcgt cccggcaccg gcccgcgcgg cgggtacccg gtggacgccg cgcgggaccg 37080 

tcctggtcac cggcggcacc ggcggcatcg gcgcgcacct ggcccgctgg ctcgccggtg 37140 

cgggcgccga gcacctggtg ctgctcaaca ggcggggagc ggaggcggcc ggtgccgccg 37200 

acctgcgtga cgaactggtc gcgctcggca cgggagtcac catcacggcc tgcgacgtcg 37260 

ccgaccgcga ccggttggcg gccgtcctcg acgccgcacg ggcgcaggga cgggtggtca 37320 

cggcggtgtt ccacgccgcc gggatctccc ggtccacagc ggtacaggag ctgaccgaga 37380 

gcgagttcac cgagatcacc gacgcgaagg tgcggggtac ggcgaacctg gccgaactct 37440 

gtcccgagct ggacgccctc gtgctgttct cctcgaacgc ggcggtgtgg ggcagcccgg 37500 

ggctggcctc ctacgcggcg ggcaacgcct tcctcgacgc cttcgcccgt cgtggtcggc 37560 

gcagtgggct gccggt.cacc tcgatcgcct ggggtctgtg ggccgggcag aacatggccg 37620 

gtaccgaggg cggcgactac ctgcgcagcc agggcctgcg cgccatggac ccgcagcggg 37680 

cgatcgagga gctgcggacc accctggacg ccggggaccc gtgggtgtcg gtggtggacc 37740 

tggaccggga gcggttcgtc gaactgttca ccgccgcccg ccgccggccc ctcttcgacg 37800 

aactcggtgg ggtccgcgcc ggggccgagg agaccggtca ggaatcggat ctcgcccggc 37860 

ggctggcgtc gatgccggag gccgaacgtc acgagcatgt cgcccggctg gtccgagccg 37920 

aggtggcagc ggtgctgggc cacggcacgc cgacggtgat cgagcgtgac gtcgccttcc 37 980 

gtgacctggg attcgactcc atgaccgccg tcgacctgcg gaaccggctc gcggcggtga 38040 

ccggggtccg ggtggccacg accatcgtct tcgaccaccc gacagtggac cgcctcaccg 38100 

cgcactacct ggaacgactc gtcggtgagc cggaggcgac gaccccggct gcggcggtcg 38160 

tcccgcaggc acccggggag gccgacgagc cgatcgcgat cgtcgggatg gcctgccgcc 38220 

tcgccggtgg agtgcgtacc cccgaccagt tgtgggactt catcgtcgcc gacggcgacg 38280 

cggtcaccga gatgccgtcg gaccggtcct gggacctcga cgcgctgttc gacccggacc 38340 

ccgagcggca cggcaccagc tactcccggc acggcgcgtt cctggacggg gcggccgact 38400 

tcgacgcggc gttcttcggg atctcgccgc gtgaggcgtt ggcgatggat ccgcagcagc 38460 

ggcaggtcct ggagacgacg tgggagctgt tcgagaacgc cggcatcgac ccgcactccc 38520 

tgcgcggtac ggacaccggt gtcttcctcg gcgctgcgta ccaggggtac ggccagaacg 38580 

cgcaggtgcc gaaggagagt gagggttacc tgctcaccgg tggttcctcg gcggtcgcct 38640 

ccggtcggat cgcgtacgtg ttggggttgg aggggccggc gatcactgtg gacacggcgt 38700 

gttcgtcgtc gcttgtggcg ttgcacgtgg cggccgggtc gctgcgatcg ggtgactgtg 38760 

ggctcgcggt ggcgggtggg gtgtcggtga tggccggtcc ggaggtgttc accgagttct 38820 

ccaggcaggg cgcgctggcc cccgacggtc ggtgcaagcc cttctccgac caggccgacg 38880 

ggttcggatt cgccgagggc gtcgctgtgg tgctcctgca gcggttgtcg gtggcggtgc 38940 

gggaggggcg tcgggtgttg ggtgtggtgg tgggttcggc ggtgaatcag gatggggcga 39000 

gtaatgggtt ggcggcgccg tcgggggtgg cgcagcagcg ggtgattcgg cgggcgtggg 39060 

gtcgtgcggg tgtgtcgggt ggggatgtgg gtgtggtgga ggcgcatggg acggggacgc 39120 

ggttggggga tccggtggag ttgggggcgt tgttggggac gtatggggtg ggtcggggtg 39180 

gggtgggtcc ggtggtggtg ggttcggtga aggcgaatgt gggtcatgtg caggcggcgg 39240 

cgggtgtggt gggtgtgatc aaggtggtgt tggggttggg tcgggggttg gtgggtccga 39300 

tggtgtgtcg gggtgggttg tcggggttgg tggattggtc gtcgggtggg ttggtggtgg 39360 

cggatggggt gcgggggtgg ccggtgggtg tggatggggt gcgtcggggt ggggtgtcgg 39420 

cgtttggggt gtcggggacg aatgctcatg tggtggtggc ggaggcgccg gggtcggtgg 39480 

tgggggcgga acggccggtg gaggggtcgt cgcgggggtt ggtgggggtg gctggtggtg 39540 

tggtgccggt ggtgctgtcg gcaaagaccg aaaccgccct gaccgagctc gcccgacgac 39600 

tgcacgacgc cgtcgacgac accgtcgccc tcccggcggt ggccgccacc ctcgccaccg 39660 

gacgcgccca cctgccctac cgggccgccc tgctggcccg cgaccacgac gaactgcgcg 39720 

acaggctgcg ggcgttcacc actggttcgg cggctcccgg tgtggtgtcg ggggtggcgt 39780 

cgggtggtgg tgtggtgttt gtttttcctg gtcagggtgg tcagtgggtg gggatggcgc 39840 

gggggttgtt gtcggttccg gtgtttgtgg agtcggtggt ggagtgtgat gcggtggtgt 39900 

cgtcggtggt ggggttttcg gtgttggggg tgttggaggg tcggtcgggt gcgccgtcgt 39960 

tggatcgggt ggatgtggtg cagccggtgt tgttcgtggt gatggtgtcg ttggcgcggt 4 0020 

tgtggcggtg gtgtggggtt gtgcctgcgg cggtggtggg tcattcgcag ggggagatcg 40080 

cggcggcggt ggtggcgggg gtgttgtcgg tgggtgatgg tgcgcgggtg gtggcgttgc 40140 

gggcgcgggc gttgcgggcg ttggccggcc acggcggcat ggtctccctc gcggtctccg 40200 

ccgaacgcgc ccgggagctg atcgcaccct ggtccgaccg gatctcggtg gcggcggtca 40260 

actccccgac ctcggtggtg gtctcgggtg acccacaggc cctcgccgcc ctcgtcgccc 40320 

actgcgccga gaccggtgag cgggccaaga cgctgcctgt ggactacgcc tcccactccg 40380 
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cccacgtcga acagatccgc gacacgatcc tcaccgacct ggccgacgtc acggcgcgcc 40440 

gacccgacgt cgccctctac tccacgctgc acggcgcccg gggcgccggc acggacatgg 40500 

acgcccggta ctggtacgac aacctgcgct caccggtgcg cttcgacgag gccgtcgagg 40560 

ccgccgtcgc cgacggctac cgggtcttcg tcgagatgag cccacacccg gtcctcaccg 4 0620 

ccgcggtgca ggagatcgac gacgagacgg tggccatcgg ctcgctgcac cgggacaccg 4 0680 

gcgagcggca cctggtcgcc gaactcgccc gggcccacgt gcacggcgta ccagtggact 40740 

ggcgggcgat cctccccgcc acccacccgg ttcccctgcc gaactacccg ttcgaggcga 4 0800 

cccggtactg gctcgccccg acggcggccg accaggtcgc cgaccaccgc taccgcgtcg 4 0860 

actggcggcc cctggccacc accccggcgg agctgtccgg cagctacctc gtcttcggcg 40920 

acgccccgga gaccctcggc cacagcgtcg agaaggccgg cgggctcctc gtcccggtgg 4 0980 

ccgctcccga ccgggagtcc ctcgcggtcg ccctggacga ggcggccgga cgactcgccg 41040 

gtgtgctctc cttcgccgcc gacaccgcca cccacctggc ccggcaccga ctcctcggcg 41100 

aggccgacgt cgaggcccca ctctggctgg tcaccagcgg cggcgtcgca ctcgacgacc 41160 

acgacccgat cgactgcgac caggcaatgg tgtgggggat cggacgggtg atgggtctgg 41220 

agaccccgca ccggtggggc ggcctggtgg acgtgaccgt cgaacccacc gccgaggacg 41280 

gggtggtctt cgccgccctc ctggccgccg acgaccacga ggaccaggtg gcgctgcgcg 4134 0 

acggcatccg ccacggccga cggctcgtcc gcgccccgct gaccacccga aacgccaggt 414 00 

ggacaccggc gggcacggcg ctcgtcacgg gcggtacggg tgccctcggc ggccacgtcg 414 60 

cgcggtacct ggcccggtcc ggggtgaccg atctcgtcct gctcagcagg agcggccccg 41520 

acgcacccgg tgccgccgaa ctggccgccg aactggccga cctcggggcc gagccgagag 41580 

tcgaggcgtg cgacgtcacc gacgggccac gcctgcgcgc cctggtgcag gagctacggg 41640 

aacaggaccg gccggtccgg atcgtcgtcc acaccgcagg ggtgcccgac tcccgtcccc 41700 

tcgaccggat cgacgaactg gagtcggtca gcgccgcgaa ggtgaccggg gcgcggctgc 417 60 

tcgacgagct ctgcccggac gccgacacct tcgtcctgtt ctcctcgggg gcgggagtgt 41820 

ggggtagcgc gaacctgggc gcgtacgcgg cagccaacgc ctacctggac gccctggccc 418 80 

accgccgccg ccaggcgggc cgggccgcga cctcggtcgc ctggggggcg tgggccggcg 41940 

acggcatggc caccggcgac ctcgacgggc tgacccggcg cggtctgcgg gcgatggcac 42000 

cggaccgggc gctgcgcgcc tgcaccaggc gttggaccac ccacgacacc tgtgtgtcgg 42060 

tagccgacgt cgactgggac cgcttcgccg tgggtttcac cgccgcccgg cccagacccc 42120 

tgatcgacga actcgtcacc tccgcgccgg tggccgcccc caccgctgcg gcggccccgg 42180 

tcccggcgat gaccgccgac cagctactcc agttcacgcg ctcgcacgtg gccgcgatcc 42240 

tcggtcacca 'ggacccggac gcggtcgggt tggaccagcc cttcaccgag ctgggcttcg 42300 

actcgctcac cgccgtcggc ctgcgcaacc agctccagca ggccaccggg cggacgctgc 42360 

ccgccgccct ggtgttccag caccccacgg tacgcagact cgccgaccac ctcgcgcagc 42420 

agctcgacgt cggcaccgcc ccggtcgagg cgacgggcag cgtcctgcgg gacggctacc 424 80 

ggcgggccgg gcagaccggc gacgtccggt cgtacctgga cctgctggcg aacctgtcgg 42540 

agttccggga gcggttcacc gacgcggcga gcctgggcgg acagctggaa ctcgtcgacc 42600 

tggccgacgg atccggcccg gtcactgtga tctgttgcgc gggcactgcg gcgctctccg 42660 

ggccgcacga gttcgcccga ctcgcctcgg cgctgcgcgg caccgtgccg gtgcgcgccc 42720 

tcgcgcaacc cgggtacgag gcgggtgaac cggtgccggc gtcgatggag gcagtgctcg 42780 

gggtgcaggc ggacgcggtc ctcgcggcac agggcgacac gccgttcgtg ctggtcggac 42840 

actcggcggg ggccctgatg gcgtacgccc tggcgaccga gctggccgac cggggccacc 42 900 

cgccacgtgg cgtcgtgctc ctcgacgtgt acccacccgg tcaccaggag gcggtgcacg 42960 

cctggctcgg cgagctgacc gccgccctgt tcgaccacga gaccgtacgg atggacgaca 43020 

cccggctcac ggccctgggg gcgtacgaca ggctgaccgg caggtggcgt ccgagggaca 4 3080 

ccggtctgcc cacgctggtg gtggccgcca gcgagccgat gggggagtgg ccggacgacg 43140 

gttggcagtc cacgtggccg ttcgggcacg acagggtcac ggtgcccggt gaccacttct 4 3200 

cgatggtgca ggagcacgcc gacgcgatcg cgcggcacat cgacgcctgg ttgagcgggg 43260 

agagggcatg aacacgaccg atcgcgccgt gctgggccga cgactccaga tgatccgggg 4 3320 

actgtactgg ggttacggca gcaacggaga cccgtacccg atgctgttgt gcgggcacga 43380 

cgacgacccg caccgctggt accgggggct gggcggatcc ggggtccggc gcagccgtac 43440 

cgagacgtgg gtggtgaccg accacgccac cgccgtgcgg gtgctcgacg acccgacctt 43500 

cacccgggcc accggccgga cgccggagtg gatgcgggcc gcgggcgccc cggcctcgac 43560 

ctgggcgcag ccgttccgtg acgtgcacgc cgcgtcctgg gacgccgaac tgcccgaccc 43620 

gcaggaggtg gaggaccggc tgacgggtct cctgcctgcc ccggggaccc gcctggacct 43680 

ggtccgcgac ctcgcctggc cgatggcgtc gcggggggtc ggcgcggacg accccgacgt 43740 

gctgcgcgcc gcgtgggacg cccgggtcgg cctcgacgcc cagctcaccc cgcagcccct 43800 

ggcggtgacc gaggcggcga tcgccgcggt gcccggggac ccgcaccggc gggcgctgtt 4 3860 

caccgccgtc gagatgacag ccaccgcgtt cgtcgacgcg gtgctggcgg tgaccgccac 43920 

ggcgggggcg gcccagcgtc tcgccgacga ccccgacgtc gccgcccgtc tcgtcgcgga 4i980- 

ggtgctgcgc ctgcatccga cggcgcacct ggaacggcgt accgccggca ccgagacggt 4 4040 
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ggtgggcgag cacacggtcg cggcgggcga cgaggtcgtc gtggtggtcg ccgccgccaa 44100 

ccgtgacgcg ggggtcttcg ccgacccgga ccgcctcgac ccggaccggg ccgacgccga 44160 

ccgggccctg tccgcccagc gcggtcaccc cggccggttg gaggagctgg tggtggtcct 4 4220 

gaccaccgcc gcactgcgca gcgtcgccaa ggcgctgccc ggtctcaccg ccggtggccc 4 4280 

ggtcgtcagg cgacgtcgtt caccggtcct gcgagccacc gcccactgcc cggtcgaact 44340 

ctgaggtgcc tgcgatgcgc gtcgtcttct cctccatggc cagcaagagc cacctgttcg 44400 

gtctcgttcc cctcgcctgg gccttccgcg cggcgggcca cgaggtacgg gtcgtcgcct 444 60 

caccggctct caccgacgac atcacggcgg ccggactgac ggccgtaccg gtcggcaccg 44520 

acgtcgacct tgtcgacttc atgacccacg ccgggtacga catcatcgac tacgtccgca 44580 

gcctggactt cagcgagcgg gacccggcca cctccacctg ggaccacctg ctcggcatgc 44 640 

agaccgtcct caccccgacc ttctacgccc tgatgagccc ggactcgctg gtcgagggca 4 4700 

tgatctcctt ctgtcggtcg tggcgacccg actggtcgtc tggaccgcag accttcgccg 4 47 60 

cgtcgatcgc ggcgacggtg accggcgtgg cccacgcccg actcctgtgg ggacccgaca 44820 

tcacggtacg ggcccggcag aagttcctcg ggctgctgcc cggacagccc gccgcccacc 4 4880 

gggaggaccc cctcgccgag tggctcacct ggtctgtgga gaggttcggc ggccgggtgc 4 4 940 

cgcaggacgt cgaggagctg gtggtcgggc agtggacgat cgaccccgcc ccggtcggga 45000 

tgcgcctcga caccgggctg aggacggtgg gcatgcgcta cgtcgactac aacggcccgt 4 5060 

cggtggtgcc ggactggctg cacgacgagc cgacccgccg acgggtctgc ctcaccctgg 4 5120 

gcatctccag ccgggagaac agcatcgggc aggtctccgt cgacgacctg ttgggtgcgc 4 5180 

tcggtgacgt cgacgccgag atcatcgcga cagtggacga gcagcagctc gaaggcgtcg 45240 

cccacgtccc ggccaacatc cgtacggtcg ggttcgtccc gatgcacgca ctgctgccga 45300 

cctgcgcggc gacggtgcac cacggcggtc ccggcagctg gcacaccgcc gccatccacg 45360 

gcgtgccgca ggtgatcctg cccgacggct gggacaccgg ggtccgcgcc cagcggaccg 45420 

aggaccaggg ggcgggcatc gccctgccgg tgcccgagct gacctccgac cagctccgcg 4 5480 

aggcggtgcg gcgggtcctg gacgatcccg ccttcaccgc cggtgcggcg cggatgcggg 45540 

ccgacatgct cgccgagccg tcccccgccg aggtcgtcga cgtctgtgcg gggctggtcg 4 5600 

gggaacggac cgccgtcgga tgagcaccga cgccacccac gtccggctcg gccggtgcgc 45660 

cctgctgacc agccggctct ggctgggtac ggcagccctc gccggccagg acgacgccga 45720 

cgcagtacgc ctgctcgacc acgcccgttc ccggggcgtc aactgcctcg acaccgccga 45780 

cgacgactct gcgtcgacca gtgcccaggt cgccgaggag tcggtcggcc ggtggttggc 45840 

cggggacacc ggtcggcggg aggagaccgt cctgtcggtg acggtgggtg tcccaccggg 45900 

cgggcaggtc ggcgggggcg gcctctccgc ccggcagatc atcgcctcct gtgagggctc 45960 

cctgcggcgt ctcggtgtcg accacgtcga cgtccttcac ctgccccggg tggaccgggt 4 6020 

ggagccgtgg gacgaggtct ggcaggcggt ggacgccctc gtggccgccg gaaaggtctg 4 6080 

ttacgtcggg tcgtcgggct tccccggatg gcacatcgtc gccgcccagg agcacgccgt 4 614 0 

ccgccgtcac cgcctcggcc tggtgtccca ccagtgtcgg tacgacctga cgtcgcgcca 4 6200 

tcccgaactg gaggtcctgc ccgccgcgca ggcgtacggg ctcggggtct tcgccaggcc 4 6260 

gacccgcctc ggcggtctgc tcggcggcga cggtccgggc gccgcagccg cacgggcgtc 4 6320 

gggacagccg acggcactgc gctcggcggt ggaggcgtac gaggtgttct gcagagacct 4 6380 

cggcgagcac cccgccgagg tcgcactggc gtgggtgctg tcccggcccg gtgtggcggg 46440 

ggcggtcgtc ggtgcgcgga cgcccggacg gctcgactcc gcgctccgcg cctgcggcgt 4 6500 

cgccctcggc gcgacggaac tcaccgccct ggacgggatc ttccccgggg tcgccgcagc 4 6560 

aggggcggcc ccggaggcgt ggctacggtg agagcccgcc cctgacctgc gggaacccgt 4 6620 

gtcggtgcgg cgggacggcc gccgcggtcc ccgccccggt cagccggtgg gggtgagccg 4 6680 

cagcaggtcc ggcgccaccg actcggccac ctccccgacg tggtcggcga ggtagaagtg 4 674 0 

cccgcccggg aaggtccggg tacggccggg gactaccgag tacggcagcc agcgttgggc 4 6800 

gtcctccacc gtcgtcaacg ggtcggtgtc accgcagagg gtggtgatgc cggcccgcag 4 6860 

cggcggcccg gcctgccagg cgtaggagcg cagcacccgg tggtcggccc gcagcaccgg 4 6920 

cagcgacatg tccaacagcc cctggtcggc caatgcggcc tcgctgaccc cgagcctgcg 46980 

catctgctcg acgagtccgt cctcgtcggg caggtcggtg cgccgctcgt ggacccgggg 47040 

ggcggtctgc ccggagacga acaaccgcag cggtcgcacc cccggacgag cctccaggcg 47100 

acgggcggtc tcgtaggcga ccagggcgcc catgctgtga ccgaacaggg cgaacggaac 47160 

ctcgccgacg aggtcgcgca gcacggccgc gacctcgtcg gcgatctccc cggcggtgcc 47220 

gagagcccgc tcgtcacgtc ggtcctgccg gcccgggtac tgcaccgccc acacgtcgac 47280 

ctccggggcc agtgcccggg cgaggtcgag gtacgagtcg gcggcggctc ccgcgtgcgg 47340 

gaagcagtac agccgggccc ggtgtccgtc ggcggacccg aaccgccgca accaggtgtt 47400 

catcggtgtc tcatccgttc ggtcgcaccg gcaggtggtc gatgccgcgc agcaggagcg 474 60 

accgccgcca gacaacctcg tcggagggga agcccagcga cagcttcggg aagcggtcga 47520 

acagggcccc cagggcgacc tctccctcca gcttggccag cgggcggccc atgcagtagt 47580 

ggatgccgtg cccgaaggtg aggtgtcccc ggctgtccct ggtgacgtcg aaccggtcgg 4>r640 - 

ggtcggggaa ctgtcccggg tcgcggttgg ccgccccgtt ggcgatcagg acggtgctgt 47700 
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acgccgggat cgtcaccccg ccgatctcca cctcggcggt ggcgaaccgg gtggtggtct 47760 

ccggtggggc ctggtagcgc aggatctcct ccaccgctcc gggcagcagt gccgggtcct 47820 

tccggaccag cgcgagctgg tcggggtggg tcagcagcag gtaggtgccg atcccgatga 47880 

ggctcaccga cgcctcgaat cccgccagca gcagcaccag cgcgatggag gtgagttcgt 47940 

cgcggctgag ccggtcggcg tcgtcgtcct ggacccggat c 47981 

<210> 2 
<211> 48 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 2 

Met Gly Asp Arg Val Asn Gly His Ala Thr Pro Glu Ser Thr Gin Ser 

15 10 15 

Ala lie Arg Phe Leu Thr Arg His Gly Gly Pro Pro Thr Ala Thr Asp 

20 25 30 

Asp Val His Asp Trp Leu Ala His Arg Ala Ala Glu His Arg Leu Glu 
35 40 45 

<210> 3 
<211> 377 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 3 




















Met, 


Aia 


Val 


Gly 


Asp 


Arg 


Arg 


Arg 


Leu 


Gly Arg Glu Leu Gin Met Ala 


l 








c 
D 










10 


15 


Arg 


uxy 


Leu 


Tyr 


Trp 


bly 


Phe 


(j±y 


Ala 


Asn Gly Asp Leu 


Tyr Ser Met 






















30 


Leu 


Leu 


Ser 


Gly 


Arg 


Asp 


Asp 


Asp 


Pro 


Trp Thr Trp Tyr 


Glu Arg Leu 






35 










40 




45 




Arg 


Ala 


Ala 


Gly 


Arg 


Gly 


Pro 


Tyr 


Ala 


Ser Arg Ala Gly Thr Trp Val 


50 










55 






60 




Val 


Gly 


Asp 


His 


Arg 


Thr 


Ala 


Ala 


Glu 


Val Leu Ala Asp 


Pro Gly Phe 


65 










70 








75 


80 


Thr 


His 


Gly 


Pro 


Pro 


Asp 


Ala 


Ala 


Arg 


Trp Met Gin Val 


Ala His Cys 










85 










90 


95 


Pro 


Ala 


Ala 


Ser 


Trp 


Ala 


Gly 


Pro 


Phe 


Arg Glu Phe Tyr 


Ala Arg Thr 








100 










105 




110 


Glu 


Asp 


Ala 


Ala 


Ser 


Val 


Thr 


Val 


Asp 


Ala Asp Trp Leu Gin Gin Arg 




115 










120 




125 




Cys 


Ala 


Arg 


Leu 


Val 


Thr 


Glu 


Leu 


Gly 


Ser Arg Phe Asp Leu Val Asn 




130 










135 






140 




Asp 


Phe 


Ala 


Arg 


Glu 


Val 


Pro 


Val 


Leu 


Ala Leu Gly Thr Ala Pro Ala 


145 










150 








155 


160 


Leu 


Lys 


Gly 


Val 


Asp 


Pro 


Asp 


Arg 


Leu 


Arg Ser Trp Thr 


Ser Ala Thr 








165 










170 


175 


Arg 


Val 


Cys 


Leu 


Asp 


Ala 


Gin 


Val 


Ser 


Pro Gin Gin Leu 


Ala Val Thr 






180 










185 




190 


Glu 


Gin 


Ala 


Leu 


Thr 


Ala 


Leu 


Asp 


Glu 


He Asp Ala Val 


Thr Gly Gly 






195 










200 




205 




Arg 


Asp 


Ala 


Ala 


Val 


Leu 


Val 


Gly 


Val 


Val Ala Glu Leu 


Ala Ala Asn 


210 










215 






220 




Thr 


Val 


Gly 


Asn 


Ala 


Val 


Leu 


Ala 


Val 


Thr Glu Leu Pro 


Glu Leu Ala 


225 








230 








235 


240 


Ala 


Arg 


Leu 


Ala 


Asp 


Asp 


Pro 


Glu 


Thr 


Ala Thr Arg Val 


Val Thr Glu 








245 










250 


255 


Val 


Ser 


Arg 


Thr 


Ser 


Pro 


Gly 


Val 


His 


Leu Glu Arg Arg 


Thr Ala Ala 








260 










265 




270 


Ser 


Asp- 


Arg 


Arg 


Val 


Gly 


Gly 


Val 


Asp 


Val Pro Thr Gly Gly Glu Val 






275 










280 




285 
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Thr Val 


Val 


Val Ala 


Ala Ala 


Asn 


Arg 


Asp 


Pro 


Glu 


Val 


Phe 


Thr 


Asp 


290 






295 










300 




* 






Pro Asp 


Arg 


Phe Asp Val Asp 


Arg 


Gly 


Gly 


Asp 


Ala 


Glu 


lie 


Leu 


Ser 


305 






310 








315 










320 


Ser Arg 


Pro 


Gly Ser 


Pro Arg 


Thr 


Asp 


Leu 


Asp 


Ala 


Leu 


Val 


Ala 


Thr 




325 








330 










335 




Leu Ala 


Thr 


Ala Ala 


Leu Arg 


Ala 


Ala 


Ala 


Pro 


Val 


Leu 


Pro 


Arg 


Leu 






340 






345 










350 






Ser Arg 


Ser 


Gly Pro 


Val lie 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Val 


Ala 


Arg 




355 






360 










365 








Gly Leu 


Ser 


Arg Cys 


Pro Val 


Glu 


Leu 

















370 375 



<210> 4 
<211> 436 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 4 



Met Arg 


Val 


Val 


Phe 


Ser 


Ser 


Met 


Ala 


Val 


Asn 


Ser 


His 


Leu 


Phe 


Gly 


1 






5 










10 










15 




Leu Val 


Pro 


Leu 
20 


Ala 


Ser 


Ala 


Phe 


Gin 
25 


Ala 


Ala 


Gly 


His 


Glu 
30 


Val 


Arg 


Val Val 


Ala 
35 


Ser 


Pro 


Ala 


Leu 


Thr 
40 


Asp 


Asp 


Val 


Thr 


Gly 
45 


Ala 


Gly 


Leu 


Thr Ala 


Val 


Pro 


Val 


Gly 


Asp 


Asp 


Val 


Glu 


Leu 


Val 


Glu 


Trp 


His 


Ala 


50 










55 










60 










His Ala 


Gly 


Gin 


Asp 


He 


Val 


Glu 


Tyr 


Met 


Arg 


Thr 


Leu 


Asp 


Trp 


Val 


65 








70 










75 










80 


Asp Gin 


Ser 


His 


Thr 
85 


Thr 


Met 


Ser 


Trp 


Asp 
90 


Asp 


Leu 


Leu 


Gly 


Met 
95 


Gin 


Thr Thr 


Phe' 


Thr 
100 


Pro 


Thr 


Phe 


Phe 


Ala 
105 


Leu 


Met 


Ser 


Pro 


Asp 
110 


Ser 


Leu 


lie Asp 


Gly 
115 


Met 


Val 


Glu 


Phe 


Cys 
120 


Arg 


Ser 


Trp 


Arg 


Pro 
125 


Asp 


Trp 


He 


Val Trp 


Glu 


Pro 


Leu 


Thr 


Phe 


Ala 


Ala 


Pro 


He 


Ala 


Ala 


Arg 


Val 


Thr 


130 










135 










140 










Gly Thr 


Pro 


His 


Ala 


Arg 


Met 


Leu 


Trp 


Gly 


Pro 


Asp 


Val 


Ala 


Thr 


Arg 


145 








150 










155 










160 


Ala Arg 


Gin 


Ser 


Phe 


Leu 


Arg 


Leu 


Leu 


Ala 


His 


Gin 


Glu 


Val 


Glu 


His 






165 










170 










175 




Arg Glu 


Asp 


Pro 
180 


Leu 


Ala 


Glu 


Trp 


Phe 
185 


Asp 


Trp 


Thr 


Leu 


Arg 
190 


Arg 


Phe 


Gly Asp 


Asp 
195 


Pro 


His 


Leu 


Ser 


Phe 
200 


Asp 


Glu 


Glu 


Leu 


Val 
205 


Leu 


Gly 


Gin 


Trp Thr 


Val 


Asp 


Pro 


He 


Pro 


Glu 


Pro 


Leu 


Arg 


He 


Asp 


Thr 


Gly 


Val 


210 










215 










220 










Arg Thr 


Val 


Gly 


Met 


Arg 


Tyr 


Val 


Pro 


Tyr 


Asn 


Gly 


Pro 


Ser 


Val 


Val 


225 








230 










235 










240 


Pro Ala 


Trp 


Leu 


Leu 
245 


Arg 


Glu 


Pro 


Glu 


Arg 
250 


Arg 


Arg 


Val 


Cys 


Leu 
255 


Thr 


Leu Gly 


Gly 


Ser 
260 


Ser 


Arg 


Glu 


His 


Gly 
265 


He 


Gly 


Gin 


Val 


Ser 
270 


He 


Gly 


Glu Met 


Leu 
275 


Asp 


Ala 


He 


Ala 


Asp 
280 


He 


Asp 


Ala 


Glu 


Phe 
285 


Val 


Ala 


Thr 


Phe Asp 


Asp 


Gin 


Gin 


Leu 


Val 


Gly 


Val 


Gly 


Ser 


Val 


Pro 


Ala 


Asn 


Val 


290 










295 










300 










Arg Thr 


Ala 


Gly 


Phe 


Val 


Pro 


Met 


Asn 


Val 


Leu 


Leu 


Pro 


Thr 


Cys 


Ala 


305 








310 










315 










320 


Ala Thr. 


Val 


His 


His 
325 


Gly 


Gly 


Thr 


Gly 


Ser 
330 


Trp 


Leu 


Thr 


Ala 


Ala 
335 


He 
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His Gly Val Pro Gin He He Leu Ser Asp Ala Asp Thr Glu Val His 

340 345 350 

Ala Lys Gin Leu Gin Asp Leu Gly Ala Gly Leu Ser Leu Pro Val Ala 

355 360 365 

Gly Met Thr Ala Glu His Leu Arg Gly Ala He Glu Arg Val Leu Asp 

370 375 380 

Glu Pro Ala Tyr Arg Leu Gly Ala Glu Arg Met Arg Asp Gly Met Arg 
385 390 395 400 

Thr Asp Pro Ser Pro Ala Gin Val Val Gly He Cys Gin Asp Leu Ala 

405 410 ■ 415 

Ala Asp Arg Ala Ala Arg Gly Arg Gin Pro Arg Arg Thr Ala Glu Pro 

420 425 430 

His Leu Pro Arg 
435 



<210> 5 
<211> 390 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 5 



Mat- 
net 


v ax 


1 Fii 


Car 


Thr 
X Hi. 


Asn 


Leu 




1 
















Ser 


Leu 


Thr 


vj j. y 


Met 


Arg 


Phe 


Val 


















nib 


val 


Leu 


C ^ 


Arg 


Leu 


lie 


Pro 
















40 


Leu 


Asp 


ni a 
nld 


true 


i rp 


Gin 


Thr 


Thr 














•j j 




rne 




Leu 




\JJ-y 


Phe 


Val 


Leu 


DD 










70 








Val 


irp 




rue 


Trr» 
1 *P 


Arg 


Arg 










O *J 








His 


Leu 


Val 


Thr 


Ala 


Phe 


Ala 


Ala 








100 










Gin 


Ala 


Val 


Ser 


Gly 


Glu 


Ala 


Leu 






115 










120 


Ala 


Trp 


Phe 


Pro 


Ala 


Leu 


Glu 


He 




130 










135 




Trp 


Ser 


Leu 


Ala 


Cys 


Glu 


Ala 


Phe 


145 










150 






Leu 


Phe 


Trp 


He 


Ser 


Gly 


He 


Arg 










165 








Ala 


Val 


Val 


Phe 


Ala 


Ala 


He 


Trp 








180 










Leu 


Leu 


Pro 


Ser 


Ser 


Pro 


Pro 


Leu 






195 










200 


He 


Gin 


Asp 


Trp 


Phe 


Leu 


Tyr 


Thr 




210 










215 




Phe 


He 


Leu 


Gly 


He 


He 


Leu 


Ala 


225 










230 






He 


Asn 


Val 


Gly 


Leu 


Leu 


Pro 


Ala 










245 








Val 


Ala 


Ser 


Leu 


Phe 


Leu 


Pro 


Gly 








260 










Met 


He 


Leu 


Pro 


Leu 


Val 


Leu 


He 






275 










280 


Leu 


Gin 


Gin 


Lys 


Arg 


Thr 


Phe 


Met 




290 










295 




Gly 


Asp 


Val 


Ser 


Phe 


Ala 


Leu 


Tyr 


305 










310 







Thr 


Thr 
1 0 


Ala 


Arg 


Pro 


Ala 


Leu 

X -J 


Asn 


Ala 


Ala 


Phe 


Leu 


Val 


Phe 


Phe 


Thr 


25 










30 






Asn 


Ser 


Tvr 


Val 


Tvr 
j 

45 


Ala 


Asp 


Glv 


Gly 


Arg 


Val 


Gly 
fin 


Val 


Ser 


Phe 


Phe 


Thr 


Trp 


Ser 


Ala 


Arg 


Ala 


Ser 


Asp 


Arg 


Val 


Cys 


Lys 


Leu 


Phe 


Pro 

7 D 


Asn 


Val 


Val 


Leu 


Phe 


Leu 


Val 


Thr 


Gly 


105 










110 






He 


Pro 


Asn 


Leu 


Leu 
125 


Leu 


He 


His 


Ser 


Phe 


Gly 


He 
140 


Asn 


Pro 


Val 


Ser 


Phe 


Tyr 


Leu 
155 


Cys 


Phe 


Pro 


Leu 


Phe 
160 


Pro 


Glu 
170 


Arg 


Leu 


Trp 


Ala 


Trp 
175 


Ala 


Ala 


Val 


Pro 


Val 


Val 


Ala 


Asp 


Leu 


185 










190 






He 


Pro 


Gly 


Leu 


Glu 
205 


Tyr 


Ser 


Ala 


Phe 


Pro 


Ala 


Thr 
220 


Arg 


Ser 


Leu 


Glu 


Arg 


He 


Leu 
235 


He 


Thr 


Gly 


Arg 


Trp 
240 


Val 


Leu 
250 


Leu 


Phe 


Pro 


Val 


Phe 
255 


Phe 


Val 


Tyr 


Ala 


He 


Ser 


Ser 


Ser 


Met 


265 










270 






He 


Ala 


Ser 


Gly 


Ala 
285 


Thr 


Ala 


Asp 


Arg 


Asn 


Arg 


Val 
300 


Met 


Val 


Trp 


Leu 


Met 


Val 


His 
315 


Phe 


Leu 


Val 


He 


Val 
320 
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Tyr Gly 


Ala Asp 


Leu 


Leu Gly 


Phe 


Ser 


Gin 


Thr 


Glu 


Asp Ala 


Pro 


Leu 






325 








330 










335 




Gly Leu 


Ala 


Leu 


Phe 


Met He 


lie 


Pro 


Phe 


Leu Ala 


Val 


Ser 


Leu 


Val 




340 








345 










350 






Leu Ser 


Trp 
355 


Leu 


Leu 


Tyr Arg 


Phe 
360 


Val 


Glu 
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Tyr 


Ala 


Gin 


His 


130 








135 






Leu Arg 


Gin Ala 


Thr 


Glu 


Asp 


Gly 


Val 


145 






150 








Leu Pro 


Ala He 


Tyr 


Gly 


His 


Ser 


Gly 






165 










Gly Val 


Val Thr 


Ala 


Met 


He 


Arg 


Arg 


180 










185 


Thr Met 


Trp His 


Glu 


Gly 


Ser 


Val 


Arg 




195 








200 




Asp Val 


Ala Thr 


Ala 


Phe 


Thr 


Ala 


Ala 


210 








215 






Val Gly Asp Val Trp 


Thr 


Pro 


Ser 


Ala 


225 






230 








Glu He 


Phe Glu 


Thr 


Val 


Ala 


Ala 


Ser 






245 










Pro Ala 


Val Pro 


Val 


Val 


Ser 


Val 


Pro 




260 










2 65 


Asn Asp 


Phe Arg 


Ser 


Asp 


Asp 


Phe 


Asp 




275 








280 




Thr Gly Trp His 


Pro 


Arg 


Val 


Pro 


Leu 


290 








295 






Val Ala 


Ala Leu 


He 


Ser 


Thr 


Lys 


Glu 



305 310 

<210> 13 
<211> 3546 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 13 



Met Val Asp 


Val 


Pro 


Asp 


Leu 


Leu 


Gly 


1 




5 










Pro Leu Pro 


Phe 
20 


Pro 


Trp 


Pro 


Leu 


Cys 
25 


Arg Ala Arg 


Ala 


Arg 


Gin 


Leu 


His 


Ala 


35 










40 




Asp Asp Val 


Val 


Ala 


Val 


Gly 


Ala 


Ala 


50 








55 






Gin Asp Gly 


Pro 


His 


Arg 


Ala 


Val 


Val 


65 






70 








Leu Thr Ala 


Ala 


Leu 
85 


Ala 


Ala 


Leu 


Ala 


Val Val~ Arg 


Gly 
100 


Val 


Ala 


Arg 


Pro 


Thr 
105 
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Val Ser 


Gly 


Phe Val Gly 


Ser 


10 




15 




Pro Leu 


Arg 


Leu Arg Ala 


Val 






30 




Pro Gly Ser 


Ala Gly He 


Glu 






45 




Gly Arg 


Val 


Ala Gin Val 


Val 




60 






Val Ala 


Tyr 


Ala Ala Gly 


Gly 


75 






80 


Pro Glu 


Ala 


Glu Arg Val 


Asn 


90 




95 




Ala Leu Arg 


Ala Arg Pro 


Gly 






110 




Thr Thr 


Gin 


Ala Ala Asn 


Pro 






125 




Lys He 


Glu 


Ala Glu Arg 


He 




140 






Val Asp 


Gly 


Val He Leu 


Arg 


155 






160 


Pro Ser Gly 


Gin Thr Gly 


Arg 


170 




175 




Ala Leu 


Ala 


Gly Glu Pro 


He 






190 




Arg Asn 


Leu 


Leu His Val 


Glu 




205 




Leu His 


Asn 


His Glu Ala 


Leu 




220 






Asp Glu Ala 


Arg Pro Leu 


Gly 


235 






240 


Val Ala 


Arg 


Gin Thr Gly 


Asn 


250 




255 




Pro Pro 


Glu 


Asn Ala Glu 


Ala 






270 




Ser Thr 


Glu 


Phe Arg Thr 


Leu 






285 




Ala Glu Gly 


He Asp Arg 


Thr 




300 







Thr Arg Thr 


Pro 


His 


Pro 


Gly 


10 






15 




Gly His Asn 


Glu 


Pro 


Glu 


Leu 




30 






Tyr Leu Glu 


Gly 
45 


He 


Ser 


Glu 


Leu Ala Arg 


Glu Thr Arg 


Ala 


60 










Val Ala Ser 


Ser 


Val 


Thr 


Glu 


75 








80 


Gin Gly Arg 


Pro 


His 


Pro 


Ser 


90 






95 




Ala Pro Val 


Val 


Phe 
110 


Val 


Leu 
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Pro Gly 


Gin 


Gly 


Ala 


Gin 


Trp 


Pro 


Gly 


Met 


Ala 


Thr 


Arg 


Leu 


Leu 


Ala 


115 










120 










125 








Glu Ser 


Pro 


Val 


Phe 


Ala 


Ala 


Ala 


Met 


Arg 


Ala 


Cys 


Glu 


Arg 


Ala 


Phe 


130 










135 










140 










Asp Glu 


Val 


Thr 


Asp 


Trp 


Ser 


Leu 


Thr 


Glu 


Val 


Leu 


Asp 


Ser 


Pro 


Glu 


145 








150 










155 










160 


His Leu 


Arg 


Arg 


Val 


Glu 


Val 


Val 


Gin 


Pro 


Ala 


Leu 


Phe 


Ala 


Val 


Gin 








165 










170 










175 




Thr Ser 


Leu 


Ala 


Ala 


Leu 


Trp 


Arg 


Ser 


Phe 


Gly 


Val 


Arg 


Pro 


Asp 


Ala 






180 










185 










190 






Val Leu 


Gly 


His 


Ser 


He 


Gly 


Glu 


Leu 


Ala 


Ala 


Ala 


Glu 


Val 


Cys 


Gly 




195 










200 










205 








Ala Val 


Asp 


Val 


Glu 


Ala 


Ala 


Ala 


Arg 


Ala 


Ala 


Ala 


Leu 


Trp 


Ser 


Arg 


210 








215 










220 










Glu Met 


Val 


Pro 


Leu 


Val 


Gly 


Arg 


Gly 


Asp 


Met 


Ala 


Ala 


Val 


Ala 


Leu 


225 








230 










235 










240 


Ser Pro 


Ala 


Glu 


Leu 


Ala 


Ala 


Arg 


Val 


Glu 


Arg 


Trp 


Asp 


Asp 


Asp 


Val 








245 










250 










255 




Val Pro 


Ala 


Gly 


Val 


Asn 


Gly 


Pro 


Arg 


Ser 


Val 


Leu 


Leu 


Thr 


Gly 


Ala 






260 










265 










270 






Pro Glu 


Pro 


He 


Ala 


Arg 


Arg 


Val 


Ala 


Glu 


Leu 


Ala 


Ala 


Gin 


Gly 


Val 




275 










280 










285 








Arg Ala 


Gin 


Val 


Val 


Asn 


Val 


Ser 


Met 


Ala 


Ala 


His 


Ser 


Ala 


Gin 


Val 


2 90 










295 










300 










Asp Ala 


Val 


Ala 


Glu 


Gly 


Met 


Arg 


Ser 


Ala 


Leu 


Thr 


Trp 


Phe 


Ala 


Pro 


305 








310 










315 










320 


Gly Asp 


Ser 


Asp 


Val 


Pro 


Tyr 


Tyr 


Ala 


Gly 


Leu 


Thr 


Gly 


Gly 


Arg 


Leu 






325 










330 










335 




Asp Thr 


Arg Glu 


Leu 


Gly 


Ala 


Asp 


His 


Trp 


Pro 


Arg 


Ser 


Phe 


Arg 


Leu 




340 










345 










350 






Pro Val 


Arg 


Phe 


Asp 


Glu 


Ala 


Thr 


Arg 


Ala 


Val 


Leu 


Glu 


Leu 


Gin 


Pro 




355 










360 










365 








Gly Thr 


Phe 


He 


Glu 


Ser 


Ser 


Pro 


His 


Pro 


Val 


Leu 


Ala 


Ala 


Ser 


Leu 


370 










375 










380 










Gin Gin 


Thr 


Leu 


Asp 


Glu 


Val 


Gly 


Ser 


Pro 


Ala 


Ala 


He 


Val 


Pro 


Thr 


385 








390 










395 










400 


Leu Gin 


Arg 


Asp 


Gin 


Gly 


Gly 


Leu 


Arg 


Arg 


Phe 


Leu 


Leu 


Ala 


Val 


Ala 








405 










410 










415 




Gin Ala 


Tyr 


Thr 


Gly 


Gly 


Val 


Thr 


Val 


Asp 


Trp 


Thr 


Ala 


Ala 


Tyr 


Pro 






420 










425 










430 






Gly Val 


Thr 


Pro 


Gly 


His 


Leu 


Pro 


Ser 


Ala 


Val 


Ala 


Val 


Glu 


Thr 


Asp 


435 










440 










445 








Glu Gly 


Pro 


Ser 


Thr 


Glu 


Phe 


Asp 


Trp 


Ala 


Ala 


Pro 


Asp 


His 


Val 


Leu 


450 










455 










4 60 










Arg Ala 


Arg 


Leu 


Leu 


Glu 


He 


Val 


Gly 


Ala 


Glu 


Thr 


Ala 


Ala 


Leu 


Ala 


465 








470 










475 










480 


Gly Arg 


Glu 


Val 


Asp 


Ala 


Arg 


Ala 


Thr 


Phe 


Arg 


Glu 


Leu 


Gly 


Leu 


Asp 








485 










490 










495 




Ser Val 


Leu 


Ala 


Val 


Gin 


Leu 


Arg 


Thr 


Arg 


Leu 


Ala 


Thr 


Ala 


Thr 


Gly 






500 










505 










510 






Arg Asp 


Leu 


His 


He 


Ala 


Met 


Leu 


Tyr 


Asp 


His 


Pro 


Thr 


Pro 


His 


Ala 


515 










520 










525 








Leu Thr 


Glu 


Ala 


Leu 


Leu 


Arg 


Gly 


Pro 


Gin 


Glu 


Glu 


Pro 


Gly 


Arg 


Gly 


530 










535 










540 










Glu Glu 


Thr 


Ala 


His 


Pro 


Thr 


Glu 


Ala 


Glu 


Pro 


Asp 


Glu 


Pro 


Val 


Ala 


545 








550 










555 










560 


Val Val 


Ala 


Met 


Ala 


Cys 


Arg 


Leu 


Pro 


Gly 


Gly 


Val 


Thr 


Ser 


Pro 


Glu 








565 










570 










575 




Glu Phe 


Trp 


Glu 


Leu 


Leu 


Ala 


Glu 


Gly 


Arg 


Asp 


Ala 


Val 


Gly 


Gly 


Leu 






580 










585 










590 






Pro Thr 


Asp 


Arg 


Gly 


Trp 


Asp 


Leu 


Asp 


Ser 


Leu 


Phe 


His 


Pro 


Asp 


Pro 
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595 










600 










605 








Thr Arg 


Ser 


Gly 


Thr 


Ala 


His 


Gin 


Arg 


Ala 


Gly 


Gly 


Phe 


Leu 


Thr 


Gly 


610 










615 










620 










Ala Thr 


Ser 


Phe 


Asp 


Ala 


Ala 


Phe 


Phe 


Gly 


Leu 


Ser 


Pro 


Arg 


Glu 


Ala 


625 








630 










635 










640 


Leu Ala 


Val 


Glu 


Pro 


Gin 


Gin 


Arg 


He 


Thr 


Leu 


Glu 


Leu 


Ser 


Trp 


Glu 








645 










650 










655 




Val Leu 


Glu 


Arg 


Ala 


Gly 


He 


Pro 


Pro 


Thr 


Ser 


Leu 


Arg 


Thr 


Ser 


Arg 






660 










665 










670 






Thr Gly 


Val 


Phe 


Val 


Gly 


Leu 


He 


Pro 


Gin 


Glu 


Tyr 


Gly 


Pro 


Arg 


Leu 


675 










680 










685 








Ala Glu 


Gly 


Gly 


Glu 


Gly 


Val 


Glu 


Gly 


Tyr 


Leu 


Met 


Thr 


Gly 


Thr 


Thr 


690 










695 










700 










Thr Ser 


Val 


Ala 


Ser 


Gly 


Arg 


Val 


Ala 


Tyr 


Thr 


Leu 


Gly 


Leu 


Glu 


Gly 


705 








710 










715 










720 


Pro Ala 


He 


Ser 


Val 


Asp 


Thr 


Ala 


Cys 


Ser 


Ser 


Ser 


Leu 


Val 


Ala 


Val 








725 










730 










735 




His Leu 


Ala 


Cys 


Gin 


Ser 


Leu 


Arg 


Arg 


Gly 


Glu 


Ser 


Thr 


Met 


Ala 


Leu 






740 










745 










750 






Ala Gly 


Gly 


Val 


Thr 


Val 


Met 


Pro 


Thr 


Pro 


Gly 


Met 


Leu 


Val 


Asp 


Phe 


755 










760 










765 








Ser Arg 


Met 


Asn 


Ser 


Leu 


Ala 


Pro 


Asp 


Gly 


Arg 


Ser 


Lys 


Ala 


Phe 


Ser 


770 










775 










780 










Ala Ala 


Ala 


Asp 


Gly 


Phe 


Gly 


Met 


Ala 


Glu 


Gly 


Ala 


Gly 


Met 


Leu 


Leu 


785 








790 










795 










800 


Leu Glu 


Arg 


Leu 


Ser 


Asp 


Ala 


Arg 


Arg 


His 


Gly 


His 


Pro 


Val 


Leu 


Ala 








805 










810 










815 




Val lie 


Arg 


Gly 


Thr 


Ala 


Val 


Asn 


Ser 


Asp 


Gly 


Ala 


Ser 


Asn 


Gly 


Leu 






820 










825 










830 






Ser Ala 


Pro 


Asn 


Gly 


Arg 


Ala 


Gin 


Val 


Arg 


Val 


He 


Arg 


Gin 


Ala 


Leu 




835 










840 










845 








Ala Glu 


Ser 


-Gly 


Leu 


Thr 


Pro 


His 


Thr 


Val 


Asp 


Val 


Val 


Glu 


Thr 


His 


850 








855 










860 










Gly Thr 


Gly 


Thr 


Arg 


Leu 


Gly 


Asp 


Pro 


He 


Glu 


Ala 


Arg 


Ala 


Leu 


Ser 


865 








870 










875 










880 


Asp Ala 


Tyr 


Gly 


Gly 


Asp 


Arg 


Glu 


His 


Pro 


Leu 


Arg 


lie 


Gly 


Ser 


Val 






885 










8 90 










895 




Lys Ser 


Asn 


He 


Gly 


His 


Thr 


Gin 


Ala 


Ala 


Ala 


Gly 


Val 


Ala 


Gly 


Leu 




900 










905 










910 






He Lys 


Leu 


Val 


Leu 


Ala 


Met 


Gin 


Ala 


Gly 


Val 


Leu 


Pro 


Arg 


Thr 


Leu 


915 










920 










925 








His Ala 


Asp 


Glu 


Pro 


Ser 


Pro 


Glu 


He 


Asp 


Trp 


Ser 


Ser 


Gly 


Ala 


He 


930 










935 










940 










Ser Leu 


Leu 


Gin 


Glu 


Pro 


Ala 


Ala 


Trp 


Pro 


Ala 


Gly 


Glu 


Arg 


Pro 


Arg 


945 








950 










955 










960 


Arg Ala 


Gly 


Val 


Ser 


Ser 


Phe 


Gly 


He 


Ser 


Gly 


Thr 


Asn 


Ala 


His 


Ala 




965 










970 










975 




He He 


Glu 


Glu 


Ala 


Pro 


Pro 


Thr 


Gly 


Asp 


Asp 


Thr 


Arg 


Pro 


Asp 


Arg 






980 










985 










990 






Met Gly 


Pro 


Val 


Val 


Pro 


Trp 


Val 


Leu 


Ser 


Ala 


Ser 


Thr 


Gly 


Glu 


Ala 


995 










1000 








1005 






Leu Arg 


Ala 


Arg 


Ala 


Ala 


Arg 


Leu 


Ala 


Gly 


His 


Leu 


Arg 


Glu 


His 


Pro 


1010 








1015 








1020 








Asp Gin Asp 


Leu 


Asp 


Asp 


Val 


Ala 


Tyr 


Ser 


Leu 


Ala 


Thr 


Gly 


Arg 


Ala 


1025 








1030 








1035 








1041 


Ala Leu 


Ala 


Tyr 


Arg 


Ser 


Gly 


Phe 


Val 


Pro 


Ala Asp Ala 


Ser 


Thr 


Ala 








1045 








1050 








1055 


Leu Arg 


He 


Leu 


Asp 


Glu 


Leu 


Ala 


Ala 


Gly Gly Ser Gly Asp 


Ala 


Val 






1060 








1065 








1070 




Thr Gly -Thr 


Ala 


Arg Ala 


Pro Gin Arg Val Val 


Phe 


Val 


Phe 


Pro Gly 



1075 1080 1085 
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Gin Gly Trp Gin Trp Ala Gly Met Ala Val Asp Leu Leu Asp Gly Asp 

1090 1095 1100 

Pro Val Phe Ala Ser Val Leu Arg Glu Cys Ala Asp Ala Leu Glu Pro 
1105 1110 1115 1120 

Tyr Leu Asp Phe Glu lie Val Pro Phe Leu Arg Ala Glu Ala Gin Arg 

1125 1130 1135 

Arg Thr Pro Asp His Thr Leu Ser Thr Asp Arg Val Asp Val Val Gin 

1140 1145 1150 

Pro Val Leu Phe Ala Val Met Val Ser Leu Ala Ala Arg Trp Arg Ala 

1155 1160 1165 • 

Tyr Gly Val Glu Pro Ala Ala Val lie Gly His Ser Gin Gly Glu lie 

1170 1175 1180 

Ala Ala Ala Cys Val Ala Gly Ala Leu Ser Leu Asp Asp Ala Ala Arg 
1185 H90 1195 1200 

Ala Val Ala Leu Arg Ser Arg Val lie Ala Thr Met Pro Gly Asn Gly 

1205 1210 1215 

Ala Met Ala Ser lie Ala Ala Ser Val Asp Glu Val Ala Ala Arg lie 

1220 1225 1230 

Asp Gly Arg Val Glu lie Ala Ala Val Asn Gly Pro Arg Ala Val Val 

1235 1240 1245 

Val Ser Gly Asp Arg Asp Asp Leu Asp Arg Leu Val Ala Ser Cys Thr 

1250 1255 1260 

Val Glu Gly Val Arg Ala Lys Arg Leu Pro Val Asp Tyr Ala Ser His 
1265 1270 1275 1280 

Ser Ser His Val Glu Ala Val Arg Asp Ala Leu His Ala Glu Leu Gly 

1285 * 1290 1295 

Glu Phe Arg Pro Leu Pro Gly Phe Val Pro Phe Tyr Ser Thr Val Thr 

1300 1305 1310 

Gly Arg Trp Val Glu Pro Ala Glu Leu Asp Ala Gly Tyr Trp Phe Arg 

1315 1320 1325 

Asn Leu Arg His Arg Val Arg Phe Ala Asp Ala Val Arg Ser Leu Ala 

1330 1335 1340 

Asp Gin Gly Tyr Thr Thr Phe Leu Glu Val Ser Ala His Pro Val Leu 
1345 * 1350 1355 1360 

Thr Thr Ala lie Glu Glu lie Gly Glu Asp Arg Gly Gly Asp Leu Val 

1365 1370 1375 

Ala Val His Ser Leu Arg Arg. Gly Ala Gly Gly Pro Val Asp Phe Gly 

1380 1385 1390 

Ser Ala Leu Ala Arg Ala Phe Val Ala Gly Val Ala Val Asp Trp Glu 

1395 1400 1405 

Ser Ala Tyr Gin Gly Ala Gly Ala Arg Arg Val Pro Leu Pro Thr Tyr 

1410 * 1415 1420 

Pro Phe Gin Arg Glu Arg Phe Trp Leu Glu Pro Asn Pro Ala Arg Arg 
1425 1430 1435 1440 

Val Ala Asp Ser Asp Asp Val Ser Ser Leu Arg Tyr Arg lie Glu Trp 

1445 1450 1455 

His Pro Thr Asp Pro Gly Glu Pro Gly Arg Leu Asp Gly Thr Trp Leu 

1460 1465 1470 

Leu Ala Thr Tyr Pro Gly Arg Ala Asp Asp Arg Val Glu Ala Ala Arg 

1475 1480 1485 

Gin Ala Leu Glu Ser Ala Gly Ala Arg Val Glu Asp Leu Val Val Glu 

1490 1495 1500 

Pro Arg Thr Gly Arg Val Asp Leu Val Arg Arg Leu Asp Ala Val Gly 
1505 "* 1510 1515 1520 

Pro Val Ala Gly Val Leu Cys Leu Phe Ala Val Ala Glu Pro Ala Ala 

1525 1530 1535 

Glu His Ser Pro Leu Ala Val Thr Ser Leu Ser Asp Thr Leu Asp Leu 

1540 1545 1550 

Thr Gin Ala Val Ala Gly Ser Gly Arg Glu Cys Pro He Trp Val Val 

. 1555 1560 1565 

Thr Glu Asn Ala Val Ala Val Gly Pro Phe Glu Arg Leu Arg Asp Pro 
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1570 1575 1580 

Ala His Gly Ala Leu Trp Ala Leu Gly Arg Val Val Ala Leu Glu Asn 
1585 1590 1595 1600 

Pro Ala Val Trp Gly Gly Leu Val Asp Val Pro Ser Gly Ser Val Ala 

1605 1610 1615 

Glu Leu Ser Arg His Leu Gly Thr Thr Leu Ser Gly Ala Gly Glu Asp 

1620 1625 1630 

Gin Val Ala Leu Arg Pro Asp Gly Thr Tyr Ala Arg Arg Trp Cys Arg 

1635 1640 1645 

Ala Gly Ala Gly Gly Thr Gly Arg Trp Gin Pro Arg Gly Thr Val Leu 

1650 1655 1660 

Val Thr Gly Gly Thr Gly Gly Val Gly Arg His Val Ala Arg Trp Leu 
1665 1670 1675 1680 

Ala Arg Gin Gly Thr Pro Cys Leu Val Leu Ala Ser Arg Arg Gly Pro 

1685 1690 1695 

Asp Ala Asp Gly Val Glu Glu Leu Leu Thr Glu Leu Ala Asp Leu Gly 

1700 1705 1710 

Thr Arg Ala Thr Val Thr Ala Cys Asp Val Thr Asp Arg Glu Gin Leu 

1715 1720 1725 

Arg Ala Leu Leu Ala Thr Val Asp Asp Glu His Pro Leu Ser Ala Val 

1730 1735 1740 

Phe His Val Ala Ala Thr Leu Asp Asp Gly Thr Val Glu Thr Leu Thr 
1745 1750 1755 1760 

Gly Asp Arg He Glu Arg Ala Asn Arg Ala Lys Val Leu Gly Ala Arg 

1765 1770 1775 

Asn Leu His Glu Leu Thr Arg Asp Ala Asp Leu Asp Ala Phe Val Leu 

1780 1785 1790 

Phe Ser Ser Ser Thr Ala Ala Phe Gly Ala Pro Gly Leu Gly Gly Tyr 

1795 1800 1805 

Val Pro Gly Asn Ala Tyr Leu Asp Gly Leu Ala Gin Gin Arg Arg Ser 

1810 J 1815 1820 

Glu Gly Leu* Pro Ala Thr Ser Val Ala Trp Gly Thr Trp Ala Gly Ser 
1825 ' 1830 1835 1840 

Gly Met Ala Glu Gly Pro Val Ala Asp Arg Phe Arg Arg His Gly Val 

1845 1850 1855 

Met Glu Met His Pro Asp Gin Ala Val Glu Gly Leu Arg Val Ala Leu 

I860 1865 1870 

Val Gin Gly Glu Val Ala Pro He Val Val Asp He Arg Trp Asp Arg 

1875 1880 1885 

Phe Leu Leu Ala Tyr Thr Ala Gin Arg Pro Thr Arg Leu Phe Asp Thr 

1890 1895 1900 

Leu Asp Glu Ala Arg Arg Ala Ala Pro Gly Pro Asp Ala Gly Pro Gly 
1905 1910 1915 1920 

Val Ala Ala Leu Ala Gly Leu Pro Val Gly Glu Arg Glu Lys Ala Val 

1925 1930 1935 

Leu Asp Leu Val Arg Thr His Ala Ala Ala Val Leu Gly His Ala Ser 

1940 1945 1950 

Ala Glu Gin Val Pro Val Asp Arg Ala Phe Ala Glu Leu Gly Val Asp. 

1955 I960 1965 

Ser Leu Ser Ala Leu Glu Leu Arg Asn Arg Leu Thr Thr Ala Thr Gly 

1970 1975 1980. 

Val Arg Leu Ala Thr Thr Thr Val Phe Asp His Pro Asp Val Arg Thr 
1985 " 1990 1995 2000 

Leu Ala Gly His Leu Ala Ala Glu Leu Gly Gly Gly Ser Gly Arg Glu 

2005 2010 2015 

Arg Pro Gly Gly Glu Ala Pro Thr Val Ala Pro Thr Asp Glu Pro He 

2020 2025 2030 

Ala He Val Gly Met Ala Cys Arg Leu Pro Gly Gly Val Asp Ser Pro 

2035 2040 2045 

Glu Gin- Leu Trp Glu Leu He Val Ser Gly Arg Asp Thr Ala Ser Ala 
2050 2055 2060 
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Ala Pro Gly Asp Arg Ser Trp Asp Pro Ala Glu Leu Met Val Ser Asp 
2065 2070 2075 2080 

Thr Thr Gly Thr Arg Thr Ala Phe Gly Asn Phe Met Pro Gly Ala Gly 

2085 2090 2095 

Glu Phe Asp Ala Ala Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Ala 

2100 2105 . 2110 

Met Asp Pro Gin Gin Arg His Ala Leu Glu Thr Thr Trp Glu Ala Leu 

2115 2120 2125 

Glu Asn Ala Gly He Arg Pro Glu Ser Leu Arg Gly Thr Asp Thr Gly 

2130 ' 2135 2140 

Val Phe Val Gly Met Ser His Gin Gly Tyr Ala Thr Gly Arg Pro Lys 
2145 ' 2150 2155 2160 

Pro Glu Asp Glu Val Asp Gly Tyr Leu Leu Thr Gly Asn Thr Ala Ser 

2165 2170 2175 

Val Ala Ser Gly Arg He Ala Tyr Val Leu Gly Leu Glu Gly Pro Ala 

2180 2185 2190 

He Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu His Val 

2195 2200 2205 

Ala Ala Gly Ser Leu Arg Ser Gly Asp Cys Gly Leu Ala Val Ala Gly 

2210 2215 2220 

Gly Val Ser Val Met Ala Gly Pro Glu Val Phe Arg Glu Phe Ser Arg 
2225 2230 2235 2240 

Gin Gly Ala Leu Ala Pro Asp Gly Arg Cys Lys Pro Phe Ser Asp Glu 

2245 2250 2255 

Ala Asp Gly Phe Gly Leu Gly Glu Gly Ser Ala Phe Val Val Leu Gin 

2260 2265 2270 

Arg Leu Ser Val Ala Val Arg Glu Gly Arg Arg Val Leu Gly Val Val 

2275 2280 2285 

Val Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 

2290 2295 2300 

Pro Ser Gly Val Ala Gin Gin Arg Val He Arg Arg Ala Trp Gly Arg 
2305 2310 2315 2320 

Ala Gly Val Ser Gly Gly Asp Val Gly Val Val Glu Ala His Gly Thr 

2325 2330 2335 

Gly Thr Arg Leu Gly Asp Pro Val Glu Leu Gly Ala Leu Leu Gly Thr 

2340 2345 2350 

Tyr Gly Val Gly Arg Gly Gly Val Gly Pro Val Val Val Gly Ser Val 

2355 2360 2365 

Lys Ala Asn Val Gly His Val Gin Ala Ala Ala Gly Val Val Gly Val 

2370 2375 2380 

He Lys Val Val Leu Gly Leu Gly Arg Gly Leu Val Gly Pro Met Val 
2385 2390 2395 2400 

Cys Arg Gly Gly Leu Ser Gly Leu Val Asp Trp Ser Ser Gly Gly Leu 

2405 2410 2415 

Val Val Ala Asp Gly Val Arg Gly Trp Pro Val Gly Val Asp Gly Val 

2420 2425 2430 

Arg Arg Gly Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 

2435 2440 2445 

Val Val Val Ala Glu Ala Pro Gly Ser Val Val Gly Ala Glu Arg Pro 

2450 2455 2460 

Val Glu Gly Ser Ser Arg Gly Leu Val Gly Val Val Gly Gly Val Val 
2465 2470 2475 2480 

Pro Val Val Leu Ser Ala Lys Thr Glu Thr Ala Leu His Ala Gin Ala 

2485 2490 2495 

Arg Arg Leu Ala Asp His Leu Glu Thr His Pro Asp Val Pro Met Thr 

2500 2505 2510 

Asp Val Val Trp Thr Leu Thr Gin Ala Arg Gin Arg Phe Asp Arg Arg 

2515 2520 2525 

Ala Val Leu Leu Ala Ala Asp Arg Thr Gin Ala Val Glu Arg Leu Arg 

2530 2535 2540 

Gly Leu Ala Gly Gly Glu Pro Gly Thr Gly Val Val Ser Gly Val Ala 
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2545 2550 2555 2560 

Ser Gly Gly Gly Val Val Phe Val Phe Pro Gly Gin Gly Gly Gin Trp 

2565 2570 2575 

Val Gly Met Ala Arg Gly Leu Leu Ser Val Pro Val Phe Val Glu Ser 

2580 2585 2590 

Val Val Glu Cys Asp Ala Val Val Ser Ser Val Val Gly Phe Ser Val 

2595 2600 2605 

Leu Gly Val Leu Glu Gly Arg Ser Gly Ala Pro Ser Leu Asp Arg Val 

2610 '. 2615 2620 

Asp Val Val Gin Pro Val Leu Phe Val Val Met Val Ser Leu Ala Arg 
2625 2630 2635 2640 

Leu Trp Arg Trp Cys Gly Val Val Pro Ala Ala Val Val Gly His Ser 

2645 2650 2655 

Gin Gly Glu He Ala Ala Ala Val Val Ala Gly Val Leu Ser Val Gly 

2660 2665 2670 

Asp Gly Ala Arg Val Val Ala Leu Arg Ala Arg Ala Leu Arg Ala Leu 

2675 2680 2685 

Ala Gly His Gly Gly Met Ala Ser Val Arg Arg Gly Arg Asp Asp Val 

2690 2695 2700 

Gin Lys Leu Leu Asp Ser Gly Pro Trp Thr Gly Lys Leu Glu lie Ala 
2705 2710 2715 2720 

Ala Val Asn Gly Pro Asp Ala Val Val Val Ser Gly Asp Pro Arg Ala 

2725 2730 2735 

Val Thr Glu Leu Val Glu His Cys Asp Gly He Gly Val Arg Ala Arg 

2740 2745 2750 

Thr He Pro Val Asp Tyr Ala Ser His Ser Ala Gin Val Glu Ser Leu 

2755 2760 2765 

Arg Glu Glu Leu Leu Ser Val Leu Ala Gly He Glu Gly Arg Pro Ala 

2770 2775 2780 

Thr Val Pro Phe Tyr Ser Thr Leu Thr Gly Gly Phe Val Asp Gly Thr 
2785 2790 2795 2800 

Glu Leu Asp* Ala Asp Tyr Trp Tyr Arg Asn Leu Arg His Pro Val Arg 

2805 2810 2815 

Phe His Ala Ala Val Glu Ala Leu Ala Ala Arg Asp Leu Thr Thr Phe 

2820 2825 2830 

Val Glu Val Ser Pro His Pro Val Leu Ser Met Ala Val Gly Glu Thr 

2835 2840 2845 

Leu Ala Asp Val Glu Ser Ala Val Thr Val Gly Thr Leu Glu Arg Asp 

2850 2855 2860 

Thr Asp Asp Val Glu Arg Phe Leu Thr Ser Leu Ala Glu Ala His Val 
2865 2870 2875 2880 

His Gly Val Pro Val Asp Trp Ala Ala Val Leu Gly Ser Gly Thr Leu 

2885 2890 2895 

Val Asp Leu Pro Thr Tyr Pro Phe Gin Gly Arg Arg Phe Trp Leu His 

2900 2905 2910 

Pro Asp Arg Gly Pro Arg Asp Asp Val Ala Asp Trp Phe His Arg Val 

2915 2920 2925 

Asp Trp Thr Ala Thr Ala Thr Asp Gly Ser Ala Arg Leu Asp Gly Arg 

2930 2935 2940 

Trp Leu Val Val Val Pro Glu Gly Tyr Thr Asp Asp Gly Trp Val Val 
2945 2950 2955 2960 

Glu Val Arg Ala Ala Leu Ala Ala Gly Gly Ala Glu Pro Val Val Thr 

2965 2970 2975 

Thr Val Glu Glu Val Thr Asp Arg Val Gly Asp Ser Asp Ala Val Val 

2980 2985 2990 

Ser Met Leu Gly Leu Ala Asp Asp Gly Ala Ala Glu Thr Leu Ala Leu 

2995 3000 3005 

Leu Arg Arg Leu Asp Ala Gin Ala Ser Thr Thr Pro Leu Trp Val Val 

3010 3015 3020 

Thr Val Gly Ala Val Ala Pro Ala Gly Pro Val Gin Arg Pro Glu Gin 
3025 3030 3035 3040 
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Ala Thr Val Trp Gly Leu Ala Leu Val Ala Ser Leu Glu Arg Gly His 

3045 3050 3055 

Arg Trp Thr Gly Leu Leu Asp Leu Pro Gin Thr Pro Asp Pro Gin Leu 

3060 3065 3070 

Arg Pro Arg Leu Val Glu Ala Leu Ala Gly Ala Glu Asp Gin Val Ala 

3075 3080 3085 

Val Arg Ala Asp Ala Val His Ala Arg Arg He Val Pro Thr Pro Val 

3090 3095 3100 

Thr Gly Ala Gly Pro Tyr Thr Ala Pro Gly Gly Thr He Leu Val Thr 
3105 ~ 3110 3115 3120 

Gly Gly Thr Ala Gly Leu Gly Ala Val Thr Ala Arg Trp Leu Ala Glu 

3125 3130 3135 

Arg Gly Ala Glu His Leu Ala Leu Val Ser Arg Arg Gly Pro Gly Thr 

3140 3145 3150 

Ala Gly Val Asp Glu Val Val Arg Asp Leu Thr Gly Leu Gly Val Arg 

3155 3160 3165 

Val Ser Val His Ser Cys Asp Val Gly Asp Arg Glu Ser Val Gly Ala 

3170 3175 3180 

Leu Val Gin Glu Leu Thr Ala Ala Gly Asp Val Val Arg Gly Val Val 
3185 3190 3195 3200 

His Ala Ala Gly Leu Pro Gin Gin Val Pro Leu Thr Asp Met Asp Pro 

3205 3210 3215 

Ala Asp Leu Ala Asp Val Val Ala Val Lys Val Asp Gly Ala Val His 

3220 * 3225 3230 

Leu Ala Asp Leu Cys Pro Glu Ala Glu Leu Phe Leu Leu Phe Ser Ser 

3235 3240 3245 

Gly Ala Gly Val Trp Gly Ser Ala Arg Gin Gly Ala Tyr Ala Ala Gly 

3250 3255 3260 

Asn Ala Phe Leu Asp Ala Phe Ala Arg His Arg Arg Asp Arg Gly Leu 
3265 3270 3275 3280 

Pro Ala Thr Ser Val Ala Trp Gly Leu Trp Ala Ala Gly Gly Met Thr 

3285 3290 3295 

Gly Asp Gin Glu Ala Val Ser Phe Leu Arg Glu Arg Gly Val Arg Pro 

3300 3305 3310 

Met Ser Val Pro Arg Ala Leu Glu Ala Leu Glu Arg Val Leu Thr Ala 

3315 3320 3325 

Gly Glu Thr Ala Val Val Val Ala Asp Val Asp Trp Ala Ala Phe Ala 

3330 3335 ~ 3340 

Glu Ser Tyr Thr Ser Ala Arg Pro Arg Pro Leu Leu His Arg Leu Val 
3345 3350 3355 3360 

Thr Pro Ala Ala Ala Val Gly Glu Arg Asp Glu Pro Arg Glu Gin Thr 

3365 3370 3375 

Leu Arg Asp Arg Leu Ala Ala Leu Pro Arg Ala Glu Arg Ser Ala Glu 

3380 3385 3390 

Leu Val Arg Leu Val Arg Arg Asp Ala Ala Ala Val Leu Gly Ser Asp 

3395 3400 3405 

Ala Lys Ala Val Pro Ala Thr Thr Pro Phe Lys Asp Leu Gly Phe Asp 

3410 3415 3420 

Ser Leu Ala Ala Val Arg Phe Arg Asn Arg Leu Ala Ala His Thr Gly 
3425 3430 3435 3440 

Leu Arg Leu Pro Ala Thr Leu Val Phe Glu His Pro Asn Ala Ala Ala 

3445 3450 3455 

Val Ala Asp Leu Leu His Asp Arg Leu Gly Glu Ala Gly Glu Pro Thr 

3460 3465 ' 3470 

Pro Val Arg Ser Val Gly Ala Gly Leu Ala Ala Leu Glu Gin Ala Leu 

3475 3480 3485 

Pro Asp Ala Ser Asp Thr Glu Arg Val Glu Leu Val Glu Arg Leu Glu 

3490 3495 3500 

Arg Met Leu Ala Gly Leu Arg Pro Glu Ala Gly Ala Gly Ala Asp Ala 
3505 3510 3515 3520 

Pro Thr Ala Gly Asp Asp Leu Gly Glu Ala Gly Val Asp Glu Leu Leu 
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3525 3530 3535 

Asp Ala Leu Glu Arg Glu Leu Asp Ala Arg 
3540 3545 

<210> 14 
<211> 3562 
<212> PRT 

<213> Micromonospora megalomicea 
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Tyr Pro Phe Gin Arg Lys Pro Tyr Trp Leu Arg Ser Ser Ala Pro Ala 

900 905 910 

Pro Ala Ser His Asp Leu Ala Tyr Arg Val Ser Trp Thr Pro He Thr 

915 920 925 

Pro Pro Gly Asp Gly Val Leu Asp Gly Asp Trp Leu Val Val His Pro 

930 " 935 940 

Gly Gly Ser Thr Gly Trp Val Asp Gly Leu Ala Ala Ala He Thr Ala 
945 950 955 960 

Gly Gly Gly Arg Val Val Ala His Pro Val Asp Ser Val Thr Ser Arg 

965 970 975 

Thr Gly Leu Ala Glu Ala Leu Ala Arg Arg Asp Gly Thr Phe Arg Gly 

980 985 990 

Val Leu Ser Trp Val Ala Thr Asp Glu Arg His Val Glu Ala Gly Ala 

995 1000 1005 

Val Ala Leu Leu Thr Leu Ala Gin Ala Leu Gly Asp Ala Gly He Asp 

1010 1015 1020 

Ala Pro Leu Trp Cys Leu Thr Gin Glu Ala Val Arg Thr Pro Val Asp 
1025 1030 1035 1040 

Gly Asp Leu Ala Arg Pro Ala Gin Ala Ala Leu His Gly Phe Ala Gin 

1045 1050 1055 

Val Ala Arg Leu Glu Leu Ala Arg Arg Phe Gly Gly Val Leu Asp Leu 

1060 1065 1070 

Pro Ala Thr Val Asp Ala Ala Gly Thr Arg Leu Val Ala Ala Val Leu 

1075 1080 1085 

Ala Gly Gly Gly Glu Asp Val Val Ala Val Arg Gly Asp Arg Leu Tyr 

1090 ' 1095 1100 

Gly Arg Arg Leu Val Arg Ala Thr Leu Pro Pro Pro Gly Gly Gly Phe 
1105 1110 1115 1120 

Thr Pro His Gly Thr Val Leu Val Thr Gly Ala Ala Gly Pro Val Gly 

1125 1130 1135 

Gly Arg Leu Ala Arg Trp Leu Ala Glu Arg Gly Ala Thr Arg Leu Val 

-1140 1145 1150 

Leu Pro Gly Ala His Pro Gly Glu Glu Leu Leu Thr Ala He Arg Ala 

1155 H60 1165 

Ala Gly Ala Thr Ala Val Val Cys Glu Pro Glu Ala Glu Ala Leu Arg 

1170 1175 1180 

Thr Ala He Gly Gly Glu Leu Pro Thr Ala Leu Val His Ala Glu Thr 
1185 1190 1195 1200 

Leu Thr Asn Phe Ala Gly Val Ala Asp Ala Asp Pro Glu Asp Phe Ala 

1205 1210 1215 

Ala Thr Val Ala Ala Lys Thr Ala Leu Pro Thr Val Leu Ala Glu Val 

1220 1225 1230 

Leu Gly Asp His Arg Leu Glu Arg Glu Val Tyr Cys Ser Ser Val Ala 

1235 1240 1245 

Gly Val Trp Gly Gly Val Gly Met Ala Ala Tyr Ala Ala Gly Ser Ala 

1250 1255 1260 

Tyr Leu Asp Ala Leu Val Glu His Arg Arg Ala Arg Gly His Ala Ser 
1265 1270 1275 1280 

Ala Ser Val Ala Trp Thr Pro Trp Ala Leu Pro Gly Ala Val Asp Asp 

1285 1290 1295 

Gly Arg Leu Arg Glu Arg Gly Leu Arg Ser Leu Asp Val Ala Asp Ala 

1300 1305 1310 

Leu Gly Thr Trp Glu Arg Leu Leu Arg Ala Gly Ala Val Ser Val Ala 

1315 1320 1325 

Val Ala Asp Val Asp Trp Ser Val Phe Thr Glu Gly Phe Ala Ala He 

1330 * 1335 1340 

Arg Pro Thr Pro Leu Phe Asp Glu Leu Leu Asp Arg Arg Gly Asp Pro 
1345 1350 1355 1360 

Asp Gly Ala Pro Val Asp Arg Pro Gly Glu Pro Ala Gly Glu Trp Gly 

1365 1370 1375 

Arg Arg He Ala Ala Leu Ser Pro Gin Glu Gin Arg Glu Thr Leu Leu 
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1380 1385 1390 

Thr Leu Val Gly Glu Thr Val Ala Glu Val Leu Gly His Glu Thr Gly 

1395 1400 1405 

Thr Glu He Asn Thr Arg Arg Ala Phe Ser Glu Leu Gly Leu Asp Ser 

1410 1415 1420 

Leu Gly Ser Met Ala Leu Arg Gin Arg Leu Ala Ala Arg Thr Gly Leu 
1425 1430 1435 1440 

Arg Met Pro Ala Ser Leu Val Phe Asp His Pro Thr Val Thr Ala Leu 

1445 - 1450 1455 

Ala Arg Tyr Leu Arg Arg Leu Val Val Gly Asp Ser Asp Pro Thr Pro 

1460 1465 1470 

Val Arg Val Phe Gly Pro Thr Asp Glu Ala Glu Pro Val Ala Val Val 

1475 1480 1485 

Gly He Gly Cys Arg Phe Pro Gly Gly He Ala Thr Pro Glu Asp Leu 

1490 1495 1500 

Trp Arg Val Val Ser Glu Gly Thr Ser He Thr Thr Gly Phe Pro Thr 
1505 1510 1515 1520 

Asp Arg Gly Trp Asp Leu Arg Arg Leu Tyr His Pro Asp Pro Asp His 

1525 1530 1535 

Pro Gly Thr Ser Tyr Val Asp Arg Gly Gly Phe Leu Asp Gly Ala Pro 

1540 1545 1550 

Asp Phe Asp Pro Gly Phe Phe Gly He Thr Pro Arg Glu Ala Leu Ala 

1555 " 1560 1565 

Met Asp Pro Gin Gin Arg Leu Thr Leu Glu lie Ala Trp Glu Ala Val 

1570 1575 1580 

Glu Arg Ala Gly He Asp Pro Glu Thr Leu Leu Gly Ser Asp Thr Gly 
1585 1590 1595 1600 

Val Phe Val Gly Met Asn Gly Gin Ser Tyr Leu Gin Leu Leu Thr Gly 

1605 1610 1615 

Glu Gly Asp Arg Leu Asn Gly Tyr Gin Gly Leu Gly Asn Ser Ala Ser 

1620 1625 1630 

Val Leu Ser' Gly Arg Val Ala Tyr Thr Phe Gly Trp Glu Gly Pro Ala 

1635 1640 1645 

Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala He His Leu 

1650 1655 1660 

Ala Met Gin Ser Leu Arg Arg Gly Glu Cys Ser Leu Ala Leu Ala Gly 
1665 1670 1675 1680 

Gly Val Thr Val Met Ala Asp Pro Tyr Thr Phe Val Asp Phe Ser Ala 

1685 1690 1695 

Gin Arg Gly Leu Ala Ala Asp Gly Arg Cys Lys Ala Phe Ser Ala Gin 

1700 1705 1710 

Ala Asp Gly Phe Ala Leu Ala Glu Gly Val Ala Ala Leu Val Leu Glu 

1715 1720 1725 

Pro Leu Ser Lys Ala Arg Arg Asn Gly His Gin Val Leu Ala Val Leu 

1730 ~ 1735 1740 

Arg Gly Ser Ala Val' Asn Gin Asp Gly Ala Ser Asn Gly Leu Ala Ala 
1745 " 1750 1755 1760 

Pro Asn Gly Pro Ser Gin Glu Arg Val He Arg Gin Ala Leu Thr Ala 

1765 1770 1775 

Ser Gly Leu Arg Pro Ala Asp Val Asp Met Val Glu Ala His Gly Thr 

1780 1785 1790 

Gly Thr Glu Leu Gly Asp Pro He Glu Ala Gly Ala Leu He Ala Ala 

1795 1800 1805 

Tyr Gly Arg Asp Arg Asp Arg Pro Leu Trp Leu Gly Ser Val Lys Thr 
1810 ^ 1815 1820 

Asn He Gly His Thr Gin Ala Ala Ala Gly Ala Ala Gly Val He Lys 
1825 1830 1835 1840 

Ala Val Leu Ala Met Arg His Gly Val Leu Pro Arg Ser Leu His Ala 

1845 1850 1855 

Asp Glu Leu Ser Pro His He Asp Trp Ala Asp Gly Lys Val Glu Val 
1860 1865 1870 
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Leu Arg Glu Ala Arg Gin Trp Pro Pro Gly Glu Arg Pro Arg Arg Ala 

1875 1880 1885 

Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala His Val He Val 

1890 1895 1900 

Glu Glu Ala Pro Ala Glu Pro Asp Pro Glu Pro Val Pro Ala Ala Pro 
1905 1910 1915 1920 

Gly Gly Pro Leu Pro Phe Val Leu His Gly Arg Ser Val Gin Thr Val 

1925 1930 1935 

Arg Ser Gin Ala Arg Thr Leu Ala Glu His Leu Arg Thr Thr Gly His 

1940 1945 1950 

Arg Asp Leu Ala Asp Thr Ala Arg Thr Leu Ala Thr Gly Arg Ala Arg 

1955 I960 1965 

Phe Asp Val Arg Ala Ala Val Leu Gly Thr Asp Arg Glu Gly Val Cys 

1970 " 1975 1980 

Ala Ala Leu Asp Ala Leu Ala Gin Asp Arg Pro Ser Pro Asp Val Val 
1985 1990 1995 2000 

Ala Pro Ala Val Phe Ala Ala Arg Thr Pro Val Leu Val Phe Pro Gly 

2005 2010 2015 

Gin Gly Ser Gin Trp Val Gly Met Ala Arg Asp Leu Leu Asp Ser Ser 

2020 2025 2030 

Glu Val Phe Ala Glu Ser Met Gly Arg Cys Ala Glu Ala Leu Ser Pro 

2035 2040 2045 

Tyr Thr Asp Trp Asp Leu Leu Asp Val Val Arg Gly Val Gly Asp Pro 

2050 2055 2060 

Asp Pro Tyr Asp Arg Val Asp Val Leu Gin Pro Val Leu Phe Ala Val 
2065 " 2070 2075 2080 

Met Val Ser Leu Ala Arg Leu Trp Gin Ser Tyr Gly Val Thr Pro Gly 

2085 2090 2095 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala 

2100 2105 2110 

Gly Ala Leu Ser Leu Ala Asp Ala Ala Arg Val Val Ala Leu Arg Ser 

2115 2120 2125 

Arg Val Leu Arg Glu Leu Asp Asp Gin Gly Gly Met Val Ser Val Gly 

2130 2135 2140 

Thr Ser Arg Ala Glu Leu Asp Ser Val Leu Arg Arg Trp Asp Gly Arg 
2145 2150 2155 2160 

Val Ala Val Ala Ala Val Asn Gly Pro Gly Thr Leu Val Val Ala Gly 

2165 2170 2175 

Pro Thr Ala Glu Leu Asp Glu Phe Leu Ala Val Ala Glu Ala Arg Glu 

2180 2185 2190 

Met Arg Pro Arg Arg He Ala Val Arg Tyr Ala Ser His Ser Pro Glu 

2195 2200 2205 

Val Ala Arg Val Glu Gin Arg Leu Ala Ala Glu Leu Gly Thr Val Thr 

2210 2215 2220 

Ala Val Gly Gly Thr Val Pro Leu Tyr Ser Thr Ala Thr Gly Asp Leu 
2225 2230 2235 2240 

Leu Asp Thr Thr Ala Met Asp Ala Gly Tyr Trp Tyr Arg Asn Leu Arg 

2245 2250 2255 

Gin Pro Val Leu Phe Glu His Ala Val Arg Ser Leu Leu Glu Arg Gly 

2260 2265 2270 

Phe Glu Thr Phe He Glu Val Ser Pro His Pro Val Leu Leu Met Ala 

2275 2280 2285 

Val Glu Glu Thr Ala Glu Asp Ala Glu Arg Pro Val Thr Gly Val Pro 

2290 2295 2300 

Thr Leu Arg Arg Asp His Asp Gly Pro Ser Glu Phe Leu Arg Asn Leu 
2305 2310 2315 2320 

Leu Gly Ala His Val His Gly Val Asp Val Asp Leu Arg Pro Ala Val 

2325 2330 2335 

Ala His Gly Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Asp Arg Gin 

2340 2345 2350 

Arg Leu Trp Pro Lys Pro His Arg Arg Ala Asp Thr Ser Ser Leu Gly 
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2355 2360 2365 

Val Arg Asp Ser Thr His Pro Leu Leu His Ala Ala Val Asp Val Pro 

2370 2375 2380 

Gly His Gly Gly Ala Val Phe Thr Gly Arg Leu Ser Pro Asp Glu Gin 
2385 " 2390 2395 2400 

Gin Trp Leu Thr Gin His Val Val Gly Gly Arg Asn Leu Val Pro Gly 

2405 2410 2415 

Ser Val Leu Val Asp Leu Ala Leu Thr Ala Gly Ala Asp Val Gly Val 

2420 * 2425 2430 

Pro Val Leu Glu Glu Leu Val Leu Gin Gin Pro Leu Val Leu Thr Ala 

2435 2440 2445 

Ala Gly Ala Leu Leu Arg Leu Ser Val Gly Ala Ala Asp Glu Asp Gly 

2450 2455 2460 

Arg Arg Pro Val Glu lie His Ala Ala Glu Asp Val Ser Asp Pro Ala 
2465 2470 2475 2480 

Glu Ala Arg Trp Ser Ala Tyr Ala Thr Gly Thr Leu Ala Val Gly Val 

2485 2490 2495 

Ala Gly Gly Gly Arg Asp Gly Thr Gin Trp Pro Pro Pro Gly Ala Thr 

2500 2505 2510 

Ala Leu Thr Leu Thr Asp His Tyr Asp Thr Leu Ala Glu Leu Gly Tyr 

2515 2520 2525 

Glu Tyr Gly Pro Ala Phe Gin Ala Leu Arg Ala Ala Trp Gin His Gly 

2530 2535 2540 

Asp Val Val Tyr Ala Glu Val Ser Leu Asp Ala Val Glu Glu Gly Tyr 
2545 ** 2550 2555 2560 

Ala Phe Asp Pro Val Leu Leu Asp Ala Val Ala Gin Thr Phe Gly Leu 

2565 2570 2575 

Thr Ser Arg Ala Pro Gly Lys Leu Pro Phe Ala Trp Arg Gly Val Thr 

2580 2585 2590 

Leu His Ala Thr Gly Ala Thr Ala Val Arg Val Val Ala Thr Pro Ala 

2595 2600 2605 

Gly Pro Asp' Ala Val Ala Leu Arg Val Thr Asp Pro Thr Gly Gin Leu 

2610 2615 2620 

Val Ala Thr Val Asp Ala Leu Val Val Arg Asp Ala Gly Ala Asp Arg 
2625 * 2630 2635 2640 

Asp Gin Pro Arg Gly Arg Asp Gly Asp Leu His Arg Leu Glu Trp Val 

2645 2650 2655 

Arg Leu Ala Thr Pro Asp Pro Thr Pro Ala Ala Val Val His Val Ala 

2660 2665 2670 

Ala Asp Gly Leu Asp Asp Leu Leu Arg Ala Gly Gly Pro Ala Pro Gin 

2675 2680 2685 

Ala Val Val Val Arg Tyr Arg Pro Asp Gly Asp Asp Pro Thr Ala Glu 

2690 2695 2700 

Ala Arg His Gly Val Leu Trp Ala Ala Thr Leu Val Arg Arg Trp Leu 
2705 2710 2715 2720 

Asp Asp Asp Arg Trp Pro Ala Thr Thr Leu Val Val Ala Thr Ser Ala 

2725 2730 2735 

Gly Val Glu Val Ser Pro Gly Asp Asp Val Pro Arg Pro Gly Ala Ala 

2740 2745 2750 

Ala Val Trp Gly Val Leu Arg Cys Ala Gin Ala Glu Ser Pro Asp Arg 

2755 2760 2765 

Phe Val Leu Val Asp Gly Asp Pro Glu Thr Pro Pro Ala Val Pro Asp 

2770 2775 2780 

Asn Pro Gin Leu Ala Val Arg Asp Gly Ala Val Phe Val Pro Arg Leu 
2785 2790 2795 2800 

Thr Pro Leu Ala Gly Pro Val Pro Ala Val Ala Asp Arg Ala Tyr Arg 

2805 2810 2815 

Leu Val Pro Gly Asn Gly Gly Ser lie Glu Ala Val Ala Phe Ala Pro 

2820 2825 2830 

Val Pro Asp Ala Asp Arg Pro Leu Ala Pro Glu Glu Val Arg Val Ala 
2835 2840 2845 
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Val Arg Ala Thr Gly Val Asn Phe Arg Asp Val Leu Leu Ala Leu Gly 

2850 2855 2860 

Met Tyr Pro Glu Pro Ala Glu Met Gly Thr Glu Ala Ser Gly Val Val 
2865 2870 2875 2880 

Thr Glu Val Gly Ser Gly Val Arg Arg Phe Thr Pro Gly Gin Ala Val 

2885 2890 2895 

Thr Gly Leu Phe Gin Gly Ala Phe Gly Pro Val Ala Val Ala Asp His 

2900 2905 2910 

Arg Leu Leu Thr Pro Val Pro Asp Gly Trp Arg Ala Val Asp Ala Ala 

2915 2920 2925 - 

Ala Val Pro lie Ala Phe Thr Thr Ala His Tyr Ala Leu His Asp Leu 

2930 2935 2940 

Ala Gly Leu Gin Ala Gly Gin Ser Val Leu Val His Ala Ala Ala Gly 
2945 2950 2955 2960 

Gly Val Gly Met Ala Ala Val Ala Leu Ala Arg Arg Ala Gly Ala Glu 

2965 2970 2975 

Val Phe Ala Thr Ala Ser Pro Ala Lys His Pro Thr Leu Arg Ala Leu 

2980 2985 2990 

Gly Leu Asp Asp Asp His lie Ala Ser Ser Arg Glu Ser Gly Phe Gly 

2995 3000 3005 

Glu Arg Phe Ala Ala Arg Thr Gly Gly Arg Gly Val Asp Val Val Leu 

3010 3015 3020 

Asn Ser Leu Thr Gly Asp Leu Leu Asp Glu Ser Ala Arg Leu Leu Ala 
3025 ~ 3030 3035 3040 

Asp Gly Gly Val Phe Val Glu Met Gly Lys Thr Asp Leu Arg Pro Ala 

3045 3050 3055 

Glu Gin Phe Arg Gly Arg Tyr Val Pro Phe Asp Leu Ala Glu Ala Gly 

3060 3065 3070 

Pro Asp Arg Leu Gly Glu lie Leu Glu Glu Val Val Gly Leu Leu Ala 

3075 3080 3085 

Ala Gly Ala Leu Asp Arg Leu Pro Val Ser Val Trp Glu Leu Ser Ala 

3090 ' 3095 3100 

Ala Pro Ala Ala Leu Thr His Met Ser Arg Gly Arg His Val Gly Lys 
3105 3110 3115 3120 

Leu Val Leu Thr Gin Pro Ala Pro Val His Pro Asp Gly Thr Val Leu 

3125 3130 3135 

Val Thr Gly Gly Thr Gly Thr Leu Gly Arg Leu Val Ala Arg His Leu 

3140 3145 3150 

Val Thr Gly His Gly Val Pro His Leu Leu Val Ala Ser Arg Arg Gly 

3155 " 3160 3165 

Pro Ala Ala Pro Gly Ala Ala Glu Leu Arg Ala Asp Val Glu Gly Leu 

3170 3175 3180 

Gly Ala Thr lie Glu He Val Ala Cys Asp Thr Ala Asp Arg Glu Ala 
3185 3190 3195 3200 

Leu Ala Ala Leu Leu Asp Ser He Pro Ala Asp Arg Pro Leu Thr Gly 

3205 3210 3215 

Val Val His Thr Ala Gly Val Leu Ala Asp Gly Leu Val Thr Ser He 

3220 3225 3230 

Asp Gly Thr Ala Thr Asp Gin Val Leu Arg Ala Lys Val Asp Ala Ala 

3235 3240 3245 

Trp His Leu His Asp Leu Thr Arg Asp Ala Asp Leu Ser Phe Phe Val 

3250 3255 3260 

Leu Phe Ser Ser Ala Ala Ser Val Leu Ala Gly Pro Gly Gin Gly Val 
3265 3270 3275 3280 

Tyr Ala Ala Ala Asn Gly Val Leu Asn Ala Leu Ala Gly Gin Arg Arg 

3285 3290 3295 

Ala Leu Gly Leu Pro Ala Lys Ala Leu Gly Trp Gly Leu Trp Ala Gin 

3300 3305 3310 

Ala Ser Glu Met Thr Ser Gly Leu Gly Asp Arg He Ala Arg Thr Gly 

3315 3320 - 3325 

Val Ala Ala Leu Pro Thr Glu Arg Ala Leu Ala Leu Phe Asp Ala Ala 
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3330 3335 3340 

Leu Arg Ser Gly Gly Glu Val Leu Phe Pro Leu Ser Val Asp Arg Ser 
3345 3350 3355 3360 

Ala Leu Arg Arg Ala Glu Tyr Val Pro Glu Val Leu Arg Gly Ala Val 

3365 3370 3375 

Arg Ser Thr Pro Arg Ala Ala Asn Arg Ala Glu Thr Pro Gly Arg Gly 

3380 3385 3390 

Leu Leu Asp Arg Leu Val Gly Ala Pro Glu Thr Asp Gin Val Ala Ala 

3395 3400 3405 

Leu Ala Glu Leu Val Arg Ser His Ala Ala Ala Val Ala Gly Tyr Asp 

3410 3415 3420 

Ser Ala Asp Gin Leu Pro Glu Arg Lys Ala Phe Lys Asp Leu Gly Phe 
3425 3430 3435 3440 

Asp Ser Leu Ala Ala Val Glu Leu Arg Asn Arg Leu Gly Val Thr Thr 

3445 3450 3455 

Gly Val Arg Leu Pro Ser Thr Leu Val Phe Asp His Pro Thr Pro Leu 

3460 3465 3470 

Ala Val Ala Glu His Leu Arg Ser Glu Leu Phe Ala Asp Ser Ala Pro 

3475 3480 3485 

Asp Val Gly Val Gly Ala Arg Leu Asp Asp Leu Glu Arg Ala Leu Asp 

3490 3495 3500 

Ala Leu Pro Asp Ala Gin Gly His Ala Asp Val Gly Ala Arg Leu Glu 
3505 3510 3515 3520 

Ala Leu Leu Arg Arg Trp Gin Ser Arg Arg Pro Pro Glu Thr Glu Pro 

3525 3530 3535 

Val Thr lie Ser Asp Asp Ala Ser Asp Asp Glu Leu Phe Ser Met Leu 

3540 3545 3550 

Asp Arg Arg Leu Gly Gly Gly Gly Asp Val 
3555 3560 

<210> 15 
<211> 3201 ' 
<212> PRT 

<213> Micromonospora megalomicea 
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1125 1130 1135 

Gly Thr Val Leu Val Thr Gly Gly Thr Gly Gly He Gly Ala His Leu 

1140 1145 1150 

Ala Arg Trp Leu Ala Gly Ala Gly Ala Glu His Leu Val Leu Leu Asn 

1155 1160 1165 

Arg Arg Gly Ala Glu Ala Ala Gly Ala Ala Asp Leu Arg Asp Glu Leu 
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1170 11*?5 1180 

Val Ala Leu Gly Thr Gly Val Thr lie Thr Ala Cys Asp Val Ala Asp 
1185 1190 1195 1200 

Arg Asp Arg Leu Ala Ala Val Leu Asp Ala Ala Arg Ala Gin Gly Arg 

1205 1210 1215 

Val Val Thr Ala Val Phe His Ala Ala Gly lie Ser Arg Ser Thr Ala 

1220 1225 1230 

Val Gin Glu Leu Thr Glu Ser Glu Phe Thr Glu lie Thr Asp Ala Lys 

1235 1240 1245 

Val Arg Gly Thr Ala Asn Leu Ala Glu Leu Cys Pro Glu Leu Asp Ala 

1250 1255 1260 

Leu Val Leu Phe Ser Ser Asn Ala Ala Val Trp Gly Ser Pro Gly Leu 
1265 1270 1275 1280 

Ala Ser Tyr Ala Ala Gly Asn Ala Phe Leu Asp Ala Phe Ala Arg Arg 

1285 1290 1295 

Gly Arg Arg Ser Gly Leu Pro Val Thr Ser He Ala Trp Gly Leu Trp 

1300 ^ 1305 1310 

Ala Gly Gin Asn Met Ala Gly Thr Glu Gly Gly Asp Tyr Leu Arg Ser 

1315 1320 1325 

Gin Gly Leu Arg Ala Met Asp Pro Gin Arg Ala He Glu Glu Leu Arg 

1330 1335 1340 

Thr Thr Leu Asp Ala Gly Asp Pro Trp Val Ser Val Val Asp Leu Asp 
1345 1350 1355 1360 

Arg Glu Arg Phe Val Glu Leu Phe Thr Ala Ala Arg Arg Arg Pro Leu 

1365 1370 1375 

Phe Asp Glu Leu Gly Gly Val Arg Ala Gly Ala Glu Glu Thr Gly Gin 

1380 1385 ~ 1390 

Glu Ser Asp Leu Ala Arg Arg Leu Ala Ser Met Pro Glu Ala Glu Arg 

1395 1400 1405 

His Glu His Val Ala Arg Leu Val Arg Ala Glu Val Ala Ala Val Leu 

1410 1415 1420 

Gly His Gly Thr Pro Thr Val He Glu Arg Asp Val Ala Phe Arg Asp 
1425 1430 1435 1440 

Leu Gly Phe Asp Ser Met Thr Ala Val Asp Leu Arg Asn Arg Leu Ala 

1445 1450 1455 

Ala Val Thr Gly Val Arg Val Ala Thr Thr He Val Phe Asp His Pro 

1460 1465 1470 

Thr Val Asp Arg Leu Thr Ala His Tyr Leu Glu Arg Leu Val Gly Glu 

1475 1480 1485 

Pro Glu Ala Thr Thr Pro Ala Ala Ala Val Val Pro Gin Ala Pro Gly 

1490 1495 1500 

Glu Ala Asp Glu Pro He Ala He Val Gly Met Ala Cys Arg Leu Ala 
1505 1510 1515 1520 

Gly Gly Val Arg Thr Pro Asp Gin Leu Trp Asp Phe He Val Ala Asp 

1525 1530 1535 

Gly Asp Ala Val Thr Glu Met Pro Ser Asp Arg Ser Trp Asp Leu Asp 

1540 1545 1550 

Ala Leu Phe Asp Pro Asp Pro Glu Arg His Gly Thr Ser Tyr Ser Arg 

1555 1560 " 1565 

His Gly Ala Phe Leu Asp Gly Ala Ala Asp Phe Asp Ala Ala Phe Phe 

1570 1575 1580 

Gly He Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Gin 
1585 1590 1595 1600 

Val Leu Glu Thr Thr Trp Glu Leu Phe Glu Asn Ala Gly He Asp Pro 

1605 1610 1615 

His Ser Leu Arg Gly Thr Asp Thr Gly Val Phe Leu Gly Ala Ala Tyr 

1620 1625 1630 

Gin Gly Tyr Gly Gin Asn Ala Gin Val Pro Lys Glu Ser Glu Gly Tyr 

1635 1640 1645 

Leu Leu Thr Gly Gly Ser Ser Ala Val Ala Ser Gly Arg He Ala Tyr 
1650 1655 1660 
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Val Leu Gly Leu Glu Gly Pro Ala lie Thr Val Asp Thr Ala Cys Ser 
1665 1670 1675 1680 

Ser Ser Leu Val Ala Leu His Val Ala Ala Gly Ser Leu Arg Ser Gly 

1685 1690 1695 

Asp Cys Gly Leu Ala Val Ala Gly Gly Val Ser Val Met Ala Gly Pro 

1700 1705 1710 

Glu Val Phe Thr Glu Phe Ser Arg Gin Gly Ala Leu Ala Pro Asp Gly 

1715 1720 1725 

Arg Cys Lys Pro Phe Ser Asp Gin Ala Asp Gly Phe Gly Phe Ala Glu 

1730 . 1735 1740 

Gly Val Ala Val Val Leu Leu Gin Arg Leu Ser Val Ala Val Arg Glu 
1745 1750 1755 1760 

Gly Arg Arg Val Leu Gly Val Val Val Gly Ser Ala Val Asn Gin Asp 

1765 1770 1775 

Gly Ala Ser Asn Gly Leu Ala Ala Pro Ser Gly Val Ala Gin Gin Arg 

1780 1785 1790 

Val He Arg Arg Ala Trp Gly Arg Ala Gly Val Ser Gly Gly Asp Val 

1795 1800 1805 

Gly Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro Val 

1810 1815 1820 

Glu Leu Gly Ala Leu Leu Gly Thr Tyr Gly Val Gly Arg Gly Gly Val 
1825 1830 1835 1840 

Gly Pro Val Val Val Gly Ser Val Lys Ala Asn Val Gly His Val Gin 

1845 1850 1855 

Ala Ala Ala Gly Val Val Gly Val He Lys Val Val Leu Gly Leu Gly 

1860 1865 1870 

Arg Gly Leu Val Gly Pro Met Val Cys Arg Gly Gly Leu Ser Gly Leu 

1875 1880 1885 

Val Asp Trp Ser Ser Gly Gly Leu Val Val Ala Asp Gly Val Arg Gly 

1890 ' 1895 1900 

Trp Pro Val Gly Val Asp Gly Val Arg Arg Gly Gly Val Ser Ala Phe 
1905 ' J 1910 1915 1920 

Gly Val Ser Gly Thr Asn Ala His Val Val Val Ala Glu Ala Pro Gly 

1925 1930 1935 

Ser Val Val Gly Ala Glu Arg Pro Val Glu Gly Ser Ser Arg Gly Leu 

1940 1945 1950 

Val Gly Val Ala Gly Gly Val Val Pro Val Val Leu Ser Ala Lys Thr 

1955 1960 1965 

Glu Thr Ala Leu Thr Glu Leu Ala Arg Arg Leu His Asp Ala Val Asp 

1970 1975 1980 

Asp Thr Val Ala Leu Pro Ala Val Ala Ala Thr Leu Ala Thr Gly Arg 
1985 1990 1995 2000 

Ala His Leu Pro Tyr Arg Ala Ala Leu Leu Ala Arg Asp His Asp Glu 

2005 2010 2015 

Leu Arg Asp Arg Leu Arg Ala Phe Thr Thr Gly Ser Ala Ala Pro Gly 

2020 2025 2030 

Val Val Ser Gly Val Ala Ser Gly Gly Gly Val Val Phe Val Phe Pro 

2035 2040 2045 

Gly Gin Gly Gly Gin Trp Val Gly Met Ala Arg Gly Leu Leu Ser Val 

2050 2055 2060 

Pro Val Phe Val Glu Ser Val Val Glu Cys Asp Ala Val Val Ser Ser 
2065 2070 2075 2080 

Val Val Gly Phe Ser Val Leu Gly Val Leu Glu Gly Arg Ser Gly Ala 

2085 2090 2095 

Pro Ser Leu Asp Arg Val Asp Val Val Gin Pro Val Leu Phe Val Val 

2100 ' 2105 2110 

Met Val Ser Leu Ala Arg Leu Trp Arg Trp Cys Gly Val Val Pro Ala 

2115 2120 2125 

Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala Val Val Ala 

2130 2135 2140 

Gly Val Leu Ser Val Gly Asp Gly Ala Arg Val Val Ala Leu Arg Ala 
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2145 2150 2155 2160 

Arg Ala Leu Arg Ala Leu Ala Gly His Gly Gly Met Val Ser Leu Ala 

2165 2170 2175 

Val Ser Ala Glu Arg Ala Arg Glu Leu lie Ala Pro Trp Ser Asp Arg 

2180 2185 2190 

lie Ser Val Ala Ala Val Asn Ser Pro Thr Ser Val Val Val Ser Gly 

2195 2200 2205 

Asp Pro Gin Ala Leu Ala Ala Leu Val Ala His Cys Ala Glu Thr Gly 

2210 2215 2220 

Glu Arg Ala Lys Thr Leu Pro Val Asp Tyr Ala Ser His Ser Ala His 
2225 " 2230 2235 2240 

Val Glu Gin lie Arg Asp Thr lie Leu Thr Asp Leu Ala Asp Val Thr 

2245 2250 2255 

Ala Arg Arg Pro Asp Val Ala Leu Tyr Ser Thr Leu His Gly Ala Arg 

2260 2265 2270 

Gly Ala Gly Thr Asp Met Asp Ala Arg Tyr Trp Tyr Asp Asn Leu Arg 

2275 2280 2285 

Ser Pro Val Arg Phe Asp Glu Ala Val Glu Ala Ala Val Ala Asp Gly 

2290 2295 2300 

Tyr Arg Val Phe Val Glu Met Ser Pro His Pro Val Leu Thr Ala Ala 
2305 2310 2315 2320 

Val Gin Glu lie Asp Asp Glu Thr Val Ala lie Gly Ser Leu His Arg 

2325 2330 2335 

Asp Thr Gly Glu Arg His Leu Val Ala Glu Leu Ala Arg Ala His Val 

2340 2345 2350 

His Gly Val Pro Val Asp Trp Arg Ala lie Leu Pro Ala Thr His Pro 

2355 2360 2365 

Val Pro Leu Pro Asn Tyr Pro Phe Glu Ala Thr Arg Tyr Trp Leu Ala 

2370 2375 2380 

Pro Thr Ala Ala Asp Gin Val Ala Asp His Arg Tyr Arg Val Asp Trp 
2385 2390 2395 2400 

Arg Pro Leu' Ala Thr Thr Pro Ala Glu Leu Ser Gly Ser Tyr Leu Val 

2405 2410 2415 

Phe Gly Asp Ala Pro Glu Thr Leu Gly His Ser Val Glu Lys Ala Gly 

2420 2425 2430 

Gly Leu Leu Val Pro Val Ala Ala Pro Asp Arg Glu Ser Leu Ala Val 

2435 2440 2445 

Ala Leu Asp Glu Ala Ala Gly Arg Leu Ala Gly Val Leu Ser Phe Ala 

2450 2455 2460 

Ala Asp Thr Ala Thr His Leu Ala Arg His Arg Leu Leu Gly Glu Ala 
2465 2470 2475 2480 

Asp Val Glu Ala Pro Leu Trp Leu Val Thr Ser Gly Gly Val Ala Leu 

2485 2490 2495 

Asp Asp His Asp Pro lie Asp Cys Asp Gin Ala Met Val Trp Gly lie 

2500 2505 2510 

Gly Arg Val Met Gly Leu Glu Thr Pro His Arg Trp Gly Gly Leu Val 

2515 ~ 2520 2525 

Asp Val Thr Val Glu Pro Thr Ala Glu Asp Gly Val Val Phe Ala Ala 

2530 2535 2540 

Leu Leu Ala Ala Asp Asp His Glu Asp Gin Val Ala Leu Arg Asp Gly 
2545 2550 2555 2560 

He Arg His Gly Arg Arg Leu Val Arg Ala Pro Leu Thr Thr Arg Asn 

2565 2570 2575 

Ala Arg Trp Thr Pro Ala Gly Thr Ala Leu Val Thr Gly Gly Thr Gly 

2580 2585 2590 

Ala Leu Gly Gly His Val Ala Arg Tyr Leu Ala Arg Ser Gly Val Thr 

2595 2600 2605 

Asp Leu Val Leu Leu Ser Arg Ser Gly Pro Asp Ala Pro Gly Ala Ala 

2610 2615 2620 

Glu Leu Ala Ala Glu Leu Ala Asp Leu Gly Ala Glu Pro Arg Val Glu 
2625 2630 2635 2640 
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Ala Cys Asp Val Thr Asp Gly Pro Arg Leu Arg Ala Leu Val Gin Glu 

2645 2650 " 2655 

Leu Arg Glu Gin Asp Arg Pro Val Arg He Val Val His Thr Ala Gly 

2660 2665 2670 

Val Pro Asp Ser Arg Pro Leu Asp Arg He Asp Glu Leu Glu Ser Val 

2675 2680 2685 

Ser Ala Ala Lys Val Thr Gly Ala Arg Leu Leu Asp Glu Leu Cys Pro 

2690 2695 2700 

Asp Ala Asp Thr Phe Val Leu Phe Ser Ser Gly Ala Gly Val Trp Gly 
2705 2710 2715 2720 

Ser Ala Asn Leu Gly Ala Tyr Ala Ala Ala Asn Ala Tyr Leu Asp Ala 

2725 2730 2735 

Leu Ala His Arg Arg Arg Gin Ala Gly Arg Ala Ala Thr Ser Val Ala 

2740 2745 2750 

Trp Gly Ala Trp Ala Gly Asp Gly Met Ala Thr Gly Asp Leu Asp Gly 

2755 2760 2765 

Leu Thr Arg Arg Gly Leu Arg Ala Met Ala Pro Asp Arg Ala Leu Arg 

2770 2775 2780 

Ala Cys Thr Arg Arg Trp Thr Thr His Asp Thr Cys Val Ser Val Ala 
2785 2790 2795 2800 

Asp Val Asp Trp Asp Arg Phe Ala Val Gly Phe Thr Ala Ala Arg Pro 

2805 2810 2815 

Arg Pro Leu He Asp Glu Leu Val Thr Ser Ala Pro Val Ala Ala Pro 

2820 2825 2830 

Thr Ala Ala Ala Ala Pro Val Pro Ala Met Thr Ala Asp Gin Leu Leu 

2835 2840 2845 

Gin Phe Thr Arg Ser His Val Ala Ala He Leu Gly His Gin Asp Pro 

2850 2855 2860 

Asp Ala Val Gly Leu Asp Gin Pro Phe Thr Glu Leu Gly Phe Asp Ser 
2865 2870 2875 2880 

Leu Thr Ala Val Gly Leu Arg Asn Gin Leu Gin Gin Ala Thr Gly Arg 

2885 2890 2895 

Thr Leu Pro Ala Ala Leu Val Phe Gin His Pro Thr Val Arg Arg Leu 

2900 2905 2910 

Ala Asp His Leu Ala Gin Gin Leu Asp Val Gly Thr Ala Pro Val Glu 

2915 2920 2925 

Ala Thr Gly Ser Val Leu Arg Asp Gly Tyr Arg Arg Ala Gly Gin Thr 

2930 2935 2940 

Gly Asp Val Arg Ser Tyr Leu Asp Leu Leu Ala Asn Leu Ser Glu Phe 
2945 2950 2955 2960 

Arg Glu Arg Phe Thr Asp Ala Ala Ser Leu Gly Gly Gin Leu Glu Leu 

2965 2970 2975 

Val Asp Leu Ala Asp Gly Ser Gly Pro Val Thr Val He Cys Cys Ala 

2980 2985 2990 

Gly Thr Ala Ala Leu Ser Gly Pro His Glu Phe Ala Arg Leu Ala Ser 

2995 3000 3005 

Ala Leu Arg Gly Thr Val Pro Val Arg Ala Leu Ala Gin Pro Gly Tyr 

3010 3015 3020 

Glu Ala Gly Glu Pro Val Pro Ala Ser Met Glu Ala Val Leu Gly Val 
3025 3030 3035 3040 

Gin Ala Asp Ala Val Leu Ala Ala Gin Gly Asp Thr Pro Phe Val Leu 

3045 3050 3055 

Val Gly His Ser Ala Gly Ala Leu Met Ala Tyr Ala Leu Ala Thr Glu 

3060 3065 3070 

Leu Ala Asp Arg Gly His Pro Pro Arg Gly Val Val Leu Leu Asp Val 

3075 3080 3085 

Tyr Pro Pro Gly His Gin Glu Ala Val His Ala Trp Leu Gly Glu Leu 

3090 3095 3100 

Thr Ala Ala Leu Phe Asp His Glu Thr Val Arg Met Asp Asp Thr Arg 
3105 3110 3115 3120 

Leu Thr Ala Leu Gly Ala Tyr Asp Arg Leu Thr Gly Arg Trp Arg Pro 
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3125 3130 3135 

Arg Asp Thr Gly Leu Pro Thr Leu Val Val Ala Ala Ser Glu Pro Met 

3140 3145 3150 

Gly Glu Trp Pro Asp Asp Gly Trp Gin Ser Thr Trp Pro Phe Gly His 

3155 3160 3165 

Asp Arg Val Thr Val Pro Gly Asp His Phe Ser Met Val Gin Glu His 

3170 3175 3180 

Ala Asp Ala lie Ala Arg His lie Asp Ala Trp Leu Ser Gly Glu Arg 
3185 3190 3195 3200 

Ala 



<210> 16 
<211> 358 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 16 



Met 


Asn 


Thr 


Thr 


Asp 


Arg 


Ala 


Val 


Leu 


Gly Arg 


Arg Leu Gin 


Met 


lie 


1 








5 










10 






15 




Arg 


Gly 


Leu 


Tyr 


Trp 


Gly 


Tyr 


Gly 


Ser 


Asn Gly 


Asp Pro Tyr 


Pro 


Met 




20 










ZD 






30 






Leu 


Leu 


Cys 


Gly 


His 


Asp 


Asp 


Asp 


Pro 


His 


Arg 


Trp Tyr Arg Gly 


Leu 






35 










A A 

H U 








45 






Gly 


Gly 


Ser 


Gly 


Val 


Arg 


Arg 


Ser 


Arg 


Thr 


Glu 


Thr Trp Val Val 


Tnr 




50 




















60 






Asp 


His 


Ala 


Thr 


Ala 


val 


Arg 


val 


Leu 


Asp Asp 


Pro Thr Phe 


Thr 


Arg 


65 










-i a 
1 U 










75 






Oil 


Ala 


Thr 


Gly 


Arg 


Thr 


Pro 


blU 


Trp 


Met 


Arg Ala 


Ala Gly Ala 


Pro 


Ala 




















90 






95 




Ser 


Thr 


Trp 


Ala 


Gin 


Pro 


Phe 


TV 

Arg 


Asp 


Val 


His 


Ala Ala Ser 


Trp 


TV _ 

ASp 








100 










1 A C 

lUb 






110 






Ala 


Glu 


Leu 


Pro 


Asp 


Pro 


Gin 


Glu 


Val 


Glu 


Asp 


Arg Leu Thr 


Gly 


Leu 






115 










1 1 A 
l^U 








125 






Leu 


Pro 


Ala 


Pro 


Gly 


Thr 


Arg 


Leu 


Asp 


Leu 


Val 


Arg Asp Leu 


Ala 


Trp 














1 

j. ~j ~) 










140 






Pro 


Met 


Ala 


Ser 


Arg 


Gly 


Val 


Gly 


Ala 


Asp Asp 


Pro Asp Val 


Leu 


Arg 


145 










150 










155 






160 


Ala 


Ala 


Trp 


Asp 


Ala 


Arg 


Val 


Gly 


Leu 


Asp 


Ala 


Gin Leu Thr 


Pro 


Gin 










165 










170 






175 




Pro 


Leu 


Ala 


Val 


Thr 


Glu 


Ala 


Ala 


lie 


Ala 


Ala 


Val Pro Gly Asp 


Pro 








180 










185 






190 






His 


Arg 


Arg 


Ala 


Leu 


Phe 


Thr 


Ala 


Val 


Glu 


Met 


Thr Ala Thr 


Ala 


Phe 




195 










200 








205 






Val 


Asp 


Ala 


Val 


Leu 


Ala 


Val 


Thr 


Ala 


Thr 


Ala 


Gly Ala Ala Gin 


Arg 




210 










215 










220 






Leu 


Ala 


Asp 


Asp 


Pro 


Asp 


Val 


Ala 


Ala 


Arg 


Leu 


Val Ala Glu 


Val 


Leu 


225 










230 










235 






240 


Arg 


Leu 


His 


Pro 


Thr 


Ala 


His 


Leu 


Glu 


Arg 


Arg 


Thr Ala Gly Thr 


Glu 










245 










250 






255 




Thr 


Val 


Val 


Gly 


Glu 


His 


Thr 


Val 


Ala 


Ala 


Gly 


Asp Glu Val Val 


Val 








260 










265 






270 






Val 


Val 


Ala 


Ala 


Ala 


Asn 


Arg 


Asp 


Ala 


Gly Val 


Phe Ala Asp 


Pro 


Asp 






275 










280 








285 






Arg 


Leu 


Asp 


Pro 


Asp 


Arg 


Ala 


Asp 


Ala 


Asp Arg 


Ala Leu Ser 


Ala 


Gin 




290 










295 










300 






Arg 


Gly 


His 


Pro 


Gly 


Arg 


Leu 


Glu 


Glu 


Leu 


Val 


Val Val Leu 


Thr 


Thr 


305 








310 










315 






320 


Ala 


Ala 


Leu 


Arg 


Ser 


Val 


Ala 


Lys 


Ala 


Leu 


Pro 


Gly Leu Thr Ala 


Gly 










325 










330 






335 




Gly 


Pro 


Val 


Val 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Val 


Leu Arg Ala 


Thr 


Ala 



48 



WO 01/27284 



PCT/US00/27433 



340 345 350 

His Cys Pro Val Glu Leu 
355 

<210> 17 
<211> 422 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 17 



Met 


Arg Val 


v ax 


true 




JC1 


nc l. 


Ala Ser 


Lys 


Ser 


His Leu 


Phe 


Gly 


1 








5 








l 0 

X u 








i -j 




Leu 


Val 


Pro 


Leu 


Ala 


Trp 


ril a 


Pho 

rne 


Arg Ala 


Ala Gly His Glu 


Va 1 
val 


Arg 








oo 










25 






30 






IF- t 

val 


Val 


Ala 


Cor 


Pro 


Ml d 


Leu 


Thr 
nix 


nop "Op 


He 


Thr 


Ala Ala 


Gly 


Leu 






35 










40 








45 






i nr 


Ala 


Val 


Pro 


val 


Pi V 

yj x y 


Thr 
i. ux 


A Qn 


Val Asp 


Leu Val Asp Phe 


Met 


Thr 




50 










55 








60 








HIS 


Ala 


Gly 


Tyr 


Asp 


Tip 
lie 


He 


Asp 


Tyr Val 


Arg 


Ser 


Leu Asd 


Phe 


Ser 


Dj 










70 








75 








80 


<jlU 


Arg Asp 


tiU 


Al a 


Thr 


Ser 


Thr 


Tro Asd 


His 


Leu 


Leu Gly 


Met 


Gin 










85 








90 








95 




i nr 


vai 


Leu 


1 ill 


Pro 


Thr 


Phe 


i yi 


Ala Leu 


Met 


Ser 


Pro Asd 


Ser 


Leu 








1 V V 










105 






110 






vai 


UlU 


u ly 


Mot- 
net 


Tip 

lit; 


Car 
OCi 


Phe 




Ar g Ser 


Trp Arg 


Pro Asp 


Trri 
iip 


Ser 






1 1 c 
1 1 J 










120 








125 






Ser 


uiy 


Pro 


PI n 


Tnr 




Ala 


Al a 


Ser Tie 

O d 11C 


Ala 


Ala 


Thr Val 


Thr 


Glv 

oxy 




loU 










135 








140 








vai 


Mia 


HIS 


nld 


Arg 


T.en 


Leu 


T m 
i rp 


p 1 v Prn 


Asp 


He 


Thr Val 


Arg 


Ala 










1 SO 

1 J V 








155 








160 


Arg 


oin 


Lys 


rile 


T on 
Li t- U 


Gl V 


Leu 


Leu 


Pro Gly 


Gin 


Pro 


Ala Ala 


His 


Arg 










1 fiR 
X V«J 








170 








175 




<J 1U 


Asp 


Pro 


T 

Leu 


riX ct 


c;i n 


Tro 


Leu 


Thr Trp 


Ser 


Val 


Glu Arg 


Phe 


Gly 








i a n 










185 






190 






pi ,, 

b±y 


Arg 


Val 


Pro 


oin 


Asp 


Ua 1 
vol 


pi n 

Ol Li 


Pl ii T.on 
VJ 1 U XjC u 


Val 


Val 


Gly Gin 


Trn 
lip 


Thr 




195 










200 








205 






Tift 

lie 


Asp 


Pro 


nld 


Pro 


Vol 


PI V 




nl y i-ffe. LI 


Asp 


Thr 


Gly Leu 


Arg 


Thr 




210 










?1 S 








220 








vai 


Gly 


Met 


Arg 


Tyr 


\/a 1 
Val 


Asp 


Tyr 


Moil oxy 


Pro 


Ser 


Val Val 


Prrt 
1 1 \j 


nop 












^ J VJ 








235 








240 


Trp 


Leu 


His 


Asp 


pi n 


Pro 


Thr 

X 111 


A r rr 
rtl y 


Arrr Arn 
m y ni y 


Val 


Cys 


Leu Thr 


Leu 


Gly 










C. H J 
















255 




T 1 fl 

lie 


Ser 


Ser 


Arg 


PI 11 


Ben 
radii 


Ser 


lie 


Glv Gin 
uiy ox ii 


Val 


Ser 


Val Asp 


Asp 


Leu 








260 










265 






270 






Leu 


Gly 


Ala 


Leu 


Gly 


Asp 


Val 


Asp 


Ala Glu 


He 


He 


Ala Thr 


Val 


Asp 






275 










280 








285 






Glu 


Gin 


Gin 


Leu 


Glu 


Gly 


Val 


Ala 


His Val 


Pro 


Ala 


Asn He 


Arg 


Thr 




290 










295 








300 








Val 


Gly 


Phe 


Val 


Pro 


Met 


His 


Ala 


Leu Leu 


Pro Thr Cys Ala 


Ala 


Thr 


305 








310 








315 








320 


Val 


His 


His 


Gly 


Gly 


Pro 


Gly 


Ser 


Trp His 


Thr 


Ala 


Ala He 


His 


Gly 










325 








330 








335 




Val 


Pro 


Gin 


Val 


He 


Leu 


Pro 


Asp 


Gly Trp 


Asp Thr Gly Val 


Arg 


Ala 








340 










345 






350 






Gin 


Arg 


Thr 


Glu 


Asp 


Gin 


Gly 


Ala 


Gly He 


Ala 


Leu 


Pro Val 


Pro 


Glu 






355 










360 








365 






Leu 


Thr 


Ser 


Asp 


Gin 


Leu 


Arg 


Glu 


Ala Val 


Arg Arg 


Val Leu 


Asp 


Asp 




370 










375 








380 








Pro 


Ala 


Phe 


Thr 


Ala 


Gly 


Ala 


Ala 


Arg Met 


Arg Ala Asp Met 


Leu 


Ala 


385 










390 








395 








400 


Glu 


Pro 


Ser 


Pro 


Ala 


Glu 


Val 


Val 


Asp Val 


Cys Ala Gly Leu 


Val 


Gly 



49 



WO 01/27284 



PCT/USOO/27433 



405 410 415 

Glu Arg Thr Ala Val Gly 
420 

<210> 18 
<211> 323 
<212> PRT 

<213> Micromonospora megalomicea 



<400> 18 



Met 


Ser 


Thr 


Asp 


Ala 


Thr 


HIS 


val 


Arg Leu Gly Arg 


Cys Ala Leu Leu 


1 








c 

3 








1U 


1 3 


Thr 


Ser 


Arg 


Leu 


Trp 


Leu 


Gly 


inr 


Ala Ala Leu Ala 


Gly Gin Asp Asp 








20 










ZD 


JU 


Ala 


Asp 


Ala 


Val 


Arg 


Leu 


Leu 


ASp 


His Ala Arg Ser 


Arg Giy vai Asn 




35 










a n 

H U 




A 

4 3 


Cys 


Leu 


Asp 


Thr 


Ala 


Asp 


Asp 


ASp 


Ser Ala Ser Thr 


ber Aia Gin vai 


50 










33 




ou 




Ala 


Glu 


Glu 


Ser 


Val 


P T • > 

Gly 


Arg 


Trp 


Leu Ala Gly Asp 


inr Giy Arg Arg 


65 










"7 O 






/ 3 


0 u 


Glu 


Glu 


Thr 


Val 


Leu 


ber 


vai 


1 nr 


vai Giy vai rro 


Dy>0 P 1 » r PI pi y% 

trro o±y uiy to-Ln 




















j D 


Val 


Gly 


Gly 


Gly 


P 1 m 

Gly 


Leu 


ber 


Ala 

Aia 


Arg vjin lie u.e 


nia oer ^.ys l»±u 








1UU 










1 nc; 

IUj 




Gly 


Ser 


Leu 


Arg 


Arg 


Leu 


Gly 


vai 


Asp His Val Asp 


vai Lieu nxs L»eu 




113 
















Pro 


Arg 


Val 


Asp 


Arg 


val 


G1U 


Pro 


irp Asp giu vai 


1 rp uin rtia vai 




loU 










1 "3^ 
Ijj 




1 a n 




Asp 


Ala 


Leu 


Val 


Ala 


Ala 


p 1 < • 

Gly 


Lys 


Val Cys Tyr Val 


Giy ber ber Giy 


14 5 










13U 






±33 


l fin 


Phe 


Pro 


Gly 


Trp 


His 


T1 _ 

lie 


vai 


Aia 


nl , pi n pit, u <i 0 
Ala Gin GIU HIS 


Aia vai Arg Arg 










1 cc 
1 DO 








1 1C\ 
1 / V 


-L / 3 


His 


Arg 


Leu 


p i « ■ 

Gly 


Leu 


vai 


Ser 


MIS 


«in Gys Arg iyr 


Tier"* T d n TKr 

/\sp 1j€u inr jci 






1 Bf> 

lOU 










185 


190 


Arg 


His 


Pro 


Glu 


Leu 


Glu 


Val 


Leu 


Pro Ala Ala Gin 


Ala Tyr Gly Leu 




195 










200 




205 


Gly 


Val 


Phe 


Ala 


Arg 


Pro 


Thr 


Arg 


Leu Gly <31y Leu 


Leu Gly Gly Asp 




210 










215 




220 




Gly 


Pro 


Gly 


Ala 


Ala 


Ala 


Ala 


Arg 


Ala Ser Gly Glh 


Pro Thr Ala Leu 


225 










230 






235 


240 


Arg 


Ser 


Ala 


Val 


Glu 


Ala 


Tyr 


Glu 


Val Phe Cys Arg 


Asp Leu Gly Glu 








245 








250 


255 


His 


Pro 


Ala 


Glu 


Val 


Ala 


Leu 


Ala 


Trp Val Leu Ser 


Arg Pro Gly Val 








260 










265 


270 


Ala 


Gly 


Ala 


Val 


Val 


Gly 


Ala 


Arg 


Thr Pro Gly Arg 


Leu Asp Ser Ala 






275 










280 




285 


Leu 


Arg 


Ala 


Cys 


Gly 


Val 


Ala 


Leu 


Gly Ala Thr Glu 


Leu Thr Ala Leu 




290 










295 




300 




Asp 


Gly 


lie 


Phe 


Pro 


Gly 


Val 


Ala 


Ala Ala Gly Ala 


Ala Pro Glu Ala 


305 










310 






315 


320 


Trp 


Leu 


Arg 

















<210> 19 
<211> 247 
<212> PRT 

<213> Micromonospora megalomicea 
<400> 19 

Met Asn Thr Trp Leu Arg Arg Phe Gly Ser Ala Asp Gly His Arg Ala 
15 10 15 
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<210> 20 
<211> 189 
<212> PRT 

<213> Micromonospora megalomicea 
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<210> 21 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic nucleotide DNA duplex 
<400> 21 

taagaattcg gagatctggc ctcagctcta gac 

<210> 22 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



33 



<220> 

<223> Complementary oligo 
<400> 22 

aattgtctag agctgaggcc agatctccga attcttaat 39 

<210> 23 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 23 

ttgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga 'ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 24 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<400> 24 

ctgcagcggt tgtcggtggc ggtgcgggag gggcgtcggg tgttgggtgt ggtggtgggt 60 

tcggcggtga atcaggatgg ggcgagtaat gggttggcgg cgccgtcggg ggtggcgcag 120 

cagcgggtga ttcggcgggc gtggggtcgt gcgggtgtgt cgggtgggga tgtgggtgtg 180 

gtggaggcgc atgggacggg gacgcggttg ggggatccgg tggagttggg ggcgttgttg 240 

gggacgtatg gggtgggtcg gggtggggtg ggtccggtgg tggtgggttc ggtgaaggcg 300 

aatgtgggtc atgtgcaggc ggcggcgggt gtggtgggtg tgatcaaggt ggtgttgggg 360 

ttgggtcggg ggttggtggg tccgatggtg tgtcggggtg ggttgtcggg gttggtggat 420 

tggtcgtcgg gtgggttggt ggtggcggat ggggtgcggg ggtggccggt gggtgtggat 480 

ggggtgcgtc ggggtggggt gtcggcgttt ggggtgtcgg ggacgaat 528 

<210> 25 
<211> 528 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 
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<221> misc_feature 
<222> (1) . . . (528) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru 101, line 23 



<400> 25 

ctgcagcgcc 

tcggccgtca 

cagcgcgtca 

gtcgaggccc 

ggcacgtacg 

aacgtcggcc 

ctcggccgcg 

tggtcgtccg 

ggcgtccgcc 



tctccgtcgc 
accaagacgg 
tacgccgcgc 
acggcaccgg 
gcgtcggccg 
acgtccaggc 
ggctggtcgg 
gcggcctggt 
ggggcggcgt 



cgtccgcgag 
cgcgtcaaac 
gtggggacgc 
cacccgcctc 
cggcggcgtc 
cgcggccggc 
cccgatggtc 
cgtcgcggac 
ctcggcgttc 



ggccgccgag 
ggcctcgccg 
gccggagtat 
ggggatcccg 
ggcccggtcg 
gtcgtcgggg 
tgccgcggcg 
ggggtccgcg 
ggcgtcagcg 



tcctcggcgt 
cgccctccgg 
cgggcggcga 
tcgagctggg 
tcgtcggcag 
tcatcaaggt 
gcctcagcgg 
gctggccggt 
ggacgaat 



cgtcgtcggc 
cgtcgcccag 
cgtcggagtc 
cgccctcctg 
cgtcaaggcc 
cgtcctcggc 
cctcgtcgac 
cggcgtcgac 



<210> 26 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 



60 
120 
180 
240 
300 
360 
420 
480 
528 



<400> 26 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 

gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 

ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 

gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 24 0 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 



<210> 27 

<211> 291 

<212> DNA 

<213> Micromonospora 



megalomicea 



<400> 27 

ggtggagtgt gatgcggtgg tgtcgtcggt ggtggggttt tcggtgttgg gggtgttgga 60 

gggtcggtcg ggtgcgccgt cgttggatcg ggtggatgtg gtgcagccgg tgttgttcgt 120 

ggtgatggtg tcgttggcgc ggttgtggcg gtggtgtggg gttgtgcctg cggcggtggt 180 

gggtcattcg cagggggaga tcgcggcggc ggtggtggcg ggggtgttgt cggtgggtga 240 

tggtgcgcgg gtggtggcgt tgcgggcgcg ggcgttgcgg gcgttggccg g 291 

<210> 28 
<211> 291 
<212> DNA 

<213> Micromonospora megalomicea 
<220> 

<221> misc_feature 
<222> (1) . . . (291) 

<223> Sequence with codon changes as described in the 

specification at page 99, line 22 thru page 101, line 23 

<400> 28 

cgtggagtgc gatgcggtcg tgtcgagcgt cgtcggcttc agcgtgctgg gcgtcctgga 60 

gggccgcagc ggcgccccga gcctggaccg cgtcgacgtg gtccagccgg tcctgttcgt 120 

ggtcatggtc agcctggccc gcctgtggcg ctggtgcggc gtggtcccgg ccgccgtggt 180 

cggccacagc cagggcgaga tcgccgccgc ggtcgtggcc ggcgtcctga gcgtcggcga 24 0 

cggcgcccgc gtcgtggccc tgcgcgcccg cgccctgcgc gccctggccg g 291 

<210> 29 
<211> 24 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 29 

gaacaactcc tgtctgcggc cgcg 24 

<210> 30 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 30 

cggaattctc tagagtcacg tctccaaccg cttgtcgagg 4 0 

<210> 31 
<211> 51 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 31 

tctagactta attaaggagg acacatatga gcgagagcag cggcatgacc g 51 

<210> 32 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> PCR primer 
<400> 32 

aacgcctccc aggagatctc cagca 25 

<210> 33 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 
<400> 33 

aattcatagc ctaggt 16 

<210> 34 

<211> 16 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo 

<400> 34 
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